NannyML Cloud
HomeBlogNannyML OSS Docs
v0.22.0
v0.22.0
  • ☂️Introduction
  • Model Monitoring
    • Quickstart
    • Data Preparation
      • How to get data ready for NannyML
    • Tutorials
      • Monitoring a tabular data model
      • Monitoring with segmentation
      • Monitoring a text classification model
      • Monitoring a computer vision model
    • How it works
      • Probabilistic Adaptive Performance Estimation (PAPE)
      • Reverse Concept Drift (RCD)
  • Product tour
    • Navigation
    • Adding a model
    • Model overview
    • Deleting a model
    • Model side panel
      • Summary
      • Performance
      • Concept drift
      • Covariate shift
      • Data quality
      • Logs
      • Model settings
        • General
        • Data
        • Performance settings
        • Concept Drift settings
        • Covariate Shift settings
        • Descriptive Statistics settings
        • Data Quality settings
    • Account settings
  • Deployment
    • Azure
      • Azure Managed Application
        • Finding the URL to access managed NannyML Cloud
        • Enabling access to storage
      • Azure Software-as-a-Service (SaaS)
    • AWS
      • AMI with CFT
        • Architecture
      • EKS
        • Quick start cluster setup
      • S3 Access
    • Application setup
      • Authentication
      • Notifications
      • Webhooks
      • Permissions
  • NannyML Cloud SDK
    • Getting Started
    • API Reference
  • Probabilistic Model Evaluation
    • Introduction
    • Tutorials
      • Evaluating a binary classification model
      • Data Preparation
    • How it works
      • HDI+ROPE (with minimum precision)
      • Getting Probability Distribution of a Performance Metric with targets
      • Getting Probability Distribution of Performance Metric without targets
      • Getting Probability Distribution of Performance Metric when some observations have labels
      • Defaults for ROPE and estimation precision
  • Experiments Module
    • Introduction
    • Tutorials
      • Running an A/B test
      • Data Preparation
    • How it works
      • Getting probability distribution of the difference of binary downstream metrics
  • miscellaneous
    • Engineering
    • Usage logging in NannyNL
    • Versions
      • Version 0.22.0
      • Version 0.21.0
Powered by GitBook
On this page
  • Intuition
  • Implementation Details
  • The PAPE Algorithm
  • Assumptions and Limitations
  • There is no Concept Drift
  • The data available is large enough.
  • There is no covariate shift to previously unseen regions.
  1. Model Monitoring
  2. How it works

Probabilistic Adaptive Performance Estimation (PAPE)

PreviousHow it worksNextReverse Concept Drift (RCD)

Intuition

Classification model predictions usually come with an associated uncertainty. For example, a binary classification model typically returns two outputs for each prediction - a predicted class (binary) and a class probability estimate (sometimes referred to as score). The score provides information about the confidence of the prediction. A rule of thumb is that the closer the score is to its lower or upper limit (usually 0 and 1), the higher the probability that the classifier’s prediction is correct. When this score is an actual probability, it can be directly used to estimate the probability of making an error. For instance, imagine a high-performing model which, for a large set of observations, returns a prediction of 1 (positive class) with a probability of 0.9. It means that the model is correct for approximately 90% of these observations, while for the other 10%, the model is wrong.

PAPE can use the uncertainty information encoded in a model's outputs in a reference dataset to estimate the confusion matrix elements for the model in a newer dataset - called analysis dataset. The resulting confusion matrix elements can then be transformed into our chosen performance metric completing the estimation process. This is done using only the model's outputs in the analysis dataset.

NannyML has previously created the for performance estimation. Further research showed that covariate shift can have a material impact on the quality of calibration. PAPE addresses this by calibrating predicted probabilities according to the data distribution of the analysis data. This is done by calculating the ratio of probability density functions between the reference and the analysis dataset. This ratio is used to perform weighted calibration on reference data which is what makes the calibration result accurately reflect the uncertainty in the analysis data.

Implementation Details

The PAPE Algorithm

Let's first go through the steps of the PAPE algorithm.

  1. Preprocess available data to create a training dataset for the density ratio estimation model.

    1. Assign label 0 to reference data and label 1 to analysis data.

    2. Concatenate reference and analysis data. Note that we are only using columns labelled as model inputs.

    3. Concatenate labels.

  2. Train the density ratio estimation model. Use it to estimate predicted probabilities, p^\hat{p}p^​, for reference data.

  3. Translate the density ratio estimation model's predicted probabilities for reference data into density ratios, also called importance weights, using the formula:

w^ref=nrefnanl⋅p^1−p^\hat{w}_{ref} = \frac{n_{ref}}{n_{anl}}\cdot\frac{\hat{p}}{1-\hat{p}}w^ref​=nanl​nref​​⋅1−p^​p^​​
  1. Fit a weighted-calibrator ccc on reference data, using the calculated weights w^ref\hat{w}_{ref}w^ref​.

  2. Get the monitored model's predicted scores on analysis data, f(xanl)f(x_{anl})f(xanl​), and perform weighted-calibration on them, c(f(xanl))c(f(x_{anl}))c(f(xanl​)).

Assumptions and Limitations

PAPE rests on some assumptions in order to give accurate estimates:

There is no Concept Drift

There should be no change between the relationship of the model targets and the model inputs. To express it mathematically P(Y∣X)\mathrm{P}(Y|X)P(Y∣X)should remain unchanged. This is a strong limitation and PAPE will give inaccurate results if this assumption is violated.

The data available is large enough.

We need enough data to be able to accurately train a density ratio estimation model and be able to properly calibrate.

There is no covariate shift to previously unseen regions.

PAPE will likely fail if there is covariate shift to previously unseen regions in the model input space. Mathematically we can say that the support of the analysis data needs to be a subset of the support of the reference data. If not density ratio estimation is theoretically not defined. Practically if we don't have data from an analysis region in the reference data we can't account for that shift with a weighted calculation from reference data.

Use the uncertainty encoded in the calibrated scores to estimate performance. This is done in the same way as the , the only difference is that we are now using the weighted-calibration scores to estimate the expected performance.

CBPE algorithm
CBPE algorithm