NannyML Cloud
HomeBlogNannyML OSS Docs
v0.24.3
v0.24.3
  • ☂️Introduction
  • Model Monitoring
    • Quickstart
    • Data Preparation
      • How to get data ready for NannyML
    • Tutorials
      • Monitoring a tabular data model
      • Monitoring with segmentation
      • Monitoring a text classification model
      • Monitoring a computer vision model
    • How it works
      • Probabilistic Adaptive Performance Estimation (PAPE)
      • Reverse Concept Drift (RCD)
    • Custom Metrics
      • Creating Custom Metrics
        • Writing Functions for Binary Classification
        • Writing Functions for Multiclass Classification
        • Writing Functions for Regression
        • Handling Missing Values
        • Advanced Tutorial: Creating a MTBF Custom Metric
      • Adding a Custom Metric through NannyML SDK
    • Reporting
      • Creating a new report
      • Report structure
      • Exporting a report
      • Managing reports
      • Report template
      • Add to report feature
  • Product tour
    • Navigation
    • Adding a model
    • Model overview
    • Deleting a model
    • Model side panel
      • Summary
      • Performance
      • Concept drift
      • Covariate shift
      • Data quality
      • Logs
      • Model settings
        • General
        • Data
        • Performance settings
        • Concept Drift settings
        • Covariate Shift settings
        • Descriptive Statistics settings
        • Data Quality settings
    • Account settings
  • Deployment
    • Azure
      • Azure Managed Application
        • Finding the URL to access managed NannyML Cloud
        • Enabling access to storage
      • Azure Software-as-a-Service (SaaS)
    • AWS
      • AMI with CFT
        • Architecture
      • EKS
        • Quick start cluster setup
      • S3 Access
    • Application setup
      • Authentication
      • Notifications
      • Webhooks
      • Permissions
  • NannyML Cloud SDK
    • Getting Started
    • Example
      • Authentication & loading data
      • Setting up the model schema
      • Creating the monitoring model
      • Customizing the monitoring model settings
      • Setting up continuous monitoring
      • Add delayed ground truth (optional)
    • API Reference
  • Probabilistic Model Evaluation
    • Introduction
    • Tutorials
      • Evaluating a binary classification model
      • Data Preparation
    • How it works
      • HDI+ROPE (with minimum precision)
      • Getting Probability Distribution of a Performance Metric with targets
      • Getting Probability Distribution of Performance Metric without targets
      • Getting Probability Distribution of Performance Metric when some observations have labels
      • Defaults for ROPE and estimation precision
  • Experiments Module
    • Introduction
    • Tutorials
      • Running an A/B test
      • Data Preparation
    • How it works
      • Getting probability distribution of the difference of binary downstream metrics
  • miscellaneous
    • Engineering
    • Usage logging in NannyNL
    • Versions
      • Version 0.24.3
      • Version 0.24.2
      • Version 0.24.1
      • Version 0.24.0
      • Version 0.23.0
      • Version 0.22.0
      • Version 0.21.0
Powered by GitBook
On this page
  • US Census MA Employment dataset
  • Adding a model to NannyML Cloud
  • Adding more data
  • Viewing Results
  1. Probabilistic Model Evaluation
  2. Tutorials

Evaluating a binary classification model

Showcasing how to perform model evaluation.

PreviousTutorialsNextData Preparation

NannyML's model evaluation module assesses whether the model's performance when deployed meets expectations with as little data as possible given a required statistical confidence. In order to do more comprehensive model monitoring over time then NannyML's should be used.

US Census MA Employment dataset

We will be using the for our tutorial. It is also used in the library .

The dataset is available through the library. We will "repackage" it using the following small snippet so that we can showcase how to use Probabilistic Model Evaluation.

import nannyml as nml

reference, evaluation, target = nml.load_us_census_ma_employment_data()
evaluation = evaluation.merge(target, how='inner', on=['id'])

# Only targets and predicted probabilities are needed for model evaluation
columns = ['employed', 'predicted_probability']

# we split our data in batches to simulate them becoming available at different times.
reference[columns].to_csv('ref1.csv', index=False)
evaluation[columns].iloc[:8_000].to_csv('evl1.csv', index=False)
evaluation[columns].iloc[8_000:16_000].to_csv('evl2.csv', index=False)
evaluation[columns].iloc[16_000:24_000].to_csv('evl3.csv', index=False)
evaluation[['predicted_probability']].iloc[24_000:32_000].to_parquet('evl4.pq', index=False)
evaluation[['predicted_probability']].iloc[32_000:40_000].to_parquet('evl5.pq', index=False)

This code sample gives us one reference dataset and five evaluation dataset batches. The first three evaluation batches contain predicted probabilities and targets, while the last two only contain predicted probabilities. This simulates the case when we have a new model in production, and we don't have targets for all the predictions we have made.

Adding a model to NannyML Cloud

When viewing the Evaluation Hub page, the Add new model button initiates the wizard that guides you through adding a new model.

The first screen of the wizard shows some basic information needed to add a model.

In the next screen you specify some important information about the model you are adding:

You need to specify:

  • The machine learning problem type of your model. Currently, only binary classification is supported.

  • What is your evaluation hypothesis? Available options are:

    • Model performance is no worse than reference performance

    • Model performance is within a specified range

  • What is the classification threshold of your model

  • The name of your model

In the next screen, you need to define the metrics you want to validate.

ROPE and the required 95% HDI precision for each metric can be manually specified or inferred from the model's behavior on the reference data.

In the next screen, you are asked about specifying how you will provide your reference data.

There are four options for adding new data—a public link, a local upload, and AWS S3 or Azure Blob storage.

We recommend using parquet files when uploading data using the user interface.

NannyML Cloud supports both parquet and CSV files, but CSV files don't store data type information. CSV files may cause incorrect data types to be inferred. If you later add more data to the model using the SDK or using parquet format, a data type conflict may occur.

After providing your data, you need to specify what it contains, as seen below.

In the next screen, you are asked about specifying how you will provide your evaluation data.

This step is optional. If evaluation data is not currently available, it can be provided later.

After providing the data we are again presented with a screen specifying their contents. Note that column names and data types cannot differ between reference and evaluation data.

Finally, we have the review screen where we can review all the parameters we have specified regarding the new model:

The application then goes to the model summary screen. If a run has not automatically started we can manually initiate it.

Adding more data

Quite often, we don't have all the data needed to evaluate a model immediately. When more data become available later, we can add them from the model settings screen. To do this we look at the Data section for the Add more rows button.

At this screen, we can also review the settings we chose during the model creation wizard and make any necessary changes.

After we add more data, through a process similar to the one we used during the add model wizard, we are again presented with a confirmation screen.

Viewing Results

After we have added all available data and NannyML Cloud has finished processing them, we can view the results of the model evaluation. They are present in the Performance tab. We can see an example below:

We see that the F1 metric has been selected. There are two plots that show us how the model performs.

  • On the left, we see the evolution of the 95% HDI of the evaluated model as we add more observations. The limits of the ROPE area are shown as horizontal red dashed lines. The HDI is colored differently according to whether we have reached the required precision to make a decision and what that decision is.

  • On the right side we see the reference posterior and the latest evaluation posterior for F1. Again, the ROPE area limits are shown as vertical red dash lines.

model monitoring module
US Census MA employment dataset
NannyML OSS
quickstart
NannyML OSS
The Add new model button, located at the top right corner of the UI.
Data Requirements tab of the add new model wizard.
Model Details tab of the add new model wizard.
Metrics tab of the add new model wizard.
Storage source screen of the reference data tab of the add new model wizard.
Define the schema screen of the reference data tab of the add new model wizard.
Storage source screen of the evaluation data tab of the add new model wizard.
Define the schema screen of the evaluation data tab of the add new model wizard.
Review tab of the add new model wizard.
First run after creating a new model
Model Settings screen.
Adding more data for model evaluation
Model Evaluation Performance Results