Monitoring a text classification model

Tutorial explaining how to monitor text classification models with NannyML

In this tutorial, we will use nannyML cloud to monitor a sentiment analysis text classification model where the model's goal is to predict the sentiment (Negative, Neutral, Positive) of a review left on Amazon.

The model and dataset

We will use a model trained on a subset of the Multilingual Amazon Reviews Dataset. The trained model can be found in the nannyML's hugging face hub.

For details of how this model was produced, check out the blog post: Are your NLP models deteriorating post-deployment? Let’s use unlabelled data to find out.

Reference and analysis sets

To evaluate the model in production, we have two sets:

Reference set - which contains all features along with the model’s predictions and labels. This set establishes a baseline for every metric we want to monitor. Find the reference set: https://raw.githubusercontent.com/NannyML/sample_datasets/main/amazon_reviews/amazon_reviews_reference.csv
Analysis set - which contains all features extracted from a production set with the model’s prediction, and in this case, labels. The analysis set is where NannyML analyzes/monitors the model’s performance and data drift of the model using the knowledge gained from the reference set. Find the analysis set: https://raw.githubusercontent.com/NannyML/sample_datasets/main/amazon_reviews/amazon_reviews_analysis_targets.csv

Monitoring with nannyML Cloud

Step 1: Add a new model

Click the Add model button to create a new model on your nannyML cloud dashboard.

Step 2: Define the problem type and main metric

Each review that we are classifying can be Negative, Positive, or Neutral. For this reason, we will set the problem type as Multiclass classification.

We will be monitoring the model's F1-score on a weekly basis.

Step 3: Configure the Reference set

Select "Upload via public link".

Use the following public URL to link the Reference dataset: https://raw.githubusercontent.com/NannyML/sample_datasets/main/amazon_reviews/amazon_reviews_reference.csv

Step 4: Define the reference dataset schema

Select the column timestamp as the Timestamp column
Select the column predicted_sentiment as the Prediction column
Select the real_sentiment as the Target column
Flag the columns negative_sentiment_pred_proba, neutral_sentiment_pred_proba, positive_sentiment_pred_proba as Prediction Scores.