Monitoring a text classification model

Tutorial explaining how to monitor text classification models with NannyML

In this tutorial, we will use nannyML cloud to monitor a sentiment analysis text classification model where the model's goal is to predict the sentiment (Negative, Neutral, Positive) of a review left on Amazon.

The model and dataset

We will use a model trained on a subset of the Multilingual Amazon Reviews Dataset. The trained model can be found in the nannyML's hugging face hub.

For details of how this model was produced, check out the blog post: Are your NLP models deteriorating post-deployment? Let’s use unlabelled data to find out.

Reference and analysis sets

To evaluate the model in production, we have two sets:

Monitoring with nannyML Cloud

Step 1: Add a new model

Click the Add model button to create a new model on your nannyML cloud dashboard.

Step 2: Define the problem type and main metric

Each review that we are classifying can be Negative, Positive, or Neutral. For this reason, we will set the problem type as Multiclass classification.

We will be monitoring the model's F1-score on a weekly basis.

Step 3: Configure the Reference set

Select "Upload via public link".

Use the following public URL to link the Reference dataset:

Step 4: Define the reference dataset schema

  1. Select the column timestamp as the Timestamp column

  2. Select the column predicted_sentiment as the Prediction column

  3. Select the real_sentiment as the Target column

  4. Flag the columns negative_sentiment_pred_proba, neutral_sentiment_pred_proba, positive_sentiment_pred_proba as Prediction Scores.

Step 5: Configure the Analysis set

Select "Upload via public link".

Use the following public URL to link the Analysis dataset:

Step 6: Start monitoring