NannyML Cloud
HomeBlogNannyML OSS Docs
v0.24.3
v0.24.3
  • ☂️Introduction
  • Model Monitoring
    • Quickstart
    • Data Preparation
      • How to get data ready for NannyML
    • Tutorials
      • Monitoring a tabular data model
      • Monitoring with segmentation
      • Monitoring a text classification model
      • Monitoring a computer vision model
    • How it works
      • Probabilistic Adaptive Performance Estimation (PAPE)
      • Reverse Concept Drift (RCD)
    • Custom Metrics
      • Creating Custom Metrics
        • Writing Functions for Binary Classification
        • Writing Functions for Multiclass Classification
        • Writing Functions for Regression
        • Handling Missing Values
        • Advanced Tutorial: Creating a MTBF Custom Metric
      • Adding a Custom Metric through NannyML SDK
    • Reporting
      • Creating a new report
      • Report structure
      • Exporting a report
      • Managing reports
      • Report template
      • Add to report feature
  • Product tour
    • Navigation
    • Adding a model
    • Model overview
    • Deleting a model
    • Model side panel
      • Summary
      • Performance
      • Concept drift
      • Covariate shift
      • Data quality
      • Logs
      • Model settings
        • General
        • Data
        • Performance settings
        • Concept Drift settings
        • Covariate Shift settings
        • Descriptive Statistics settings
        • Data Quality settings
    • Account settings
  • Deployment
    • Azure
      • Azure Managed Application
        • Finding the URL to access managed NannyML Cloud
        • Enabling access to storage
      • Azure Software-as-a-Service (SaaS)
    • AWS
      • AMI with CFT
        • Architecture
      • EKS
        • Quick start cluster setup
      • S3 Access
    • Application setup
      • Authentication
      • Notifications
      • Webhooks
      • Permissions
  • NannyML Cloud SDK
    • Getting Started
    • Example
      • Authentication & loading data
      • Setting up the model schema
      • Creating the monitoring model
      • Customizing the monitoring model settings
      • Setting up continuous monitoring
      • Add delayed ground truth (optional)
    • API Reference
  • Probabilistic Model Evaluation
    • Introduction
    • Tutorials
      • Evaluating a binary classification model
      • Data Preparation
    • How it works
      • HDI+ROPE (with minimum precision)
      • Getting Probability Distribution of a Performance Metric with targets
      • Getting Probability Distribution of Performance Metric without targets
      • Getting Probability Distribution of Performance Metric when some observations have labels
      • Defaults for ROPE and estimation precision
  • Experiments Module
    • Introduction
    • Tutorials
      • Running an A/B test
      • Data Preparation
    • How it works
      • Getting probability distribution of the difference of binary downstream metrics
  • miscellaneous
    • Engineering
    • Usage logging in NannyNL
    • Versions
      • Version 0.24.3
      • Version 0.24.2
      • Version 0.24.1
      • Version 0.24.0
      • Version 0.23.0
      • Version 0.22.0
      • Version 0.21.0
Powered by GitBook
On this page
  • Accuracy Score
  • Precision and Recall
  • F1
  • ROC AUC
  1. Probabilistic Model Evaluation
  2. How it works

Getting Probability Distribution of a Performance Metric with targets

This page describes how NannyML estimates probability distribution of a performance metric when the targets are available.

PreviousHDI+ROPE (with minimum precision)NextGetting Probability Distribution of Performance Metric without targets

As described in the , Probabilistic Model Evaluation uses performance metric probability distribution estimated with the Bayesian approach. When the experiment data has targets, the task is relatively straightforward. The implementation details depend on the performance metric. Here, we will show how it is done for selected metrics.

Accuracy Score

To calculate the sample-level accuracy score, we can assign 1 to each correct prediction (when binary prediction is equal to the label) and 0 to each incorrect one. Then, we calculate the mean of these to get the accuracy point value. In the Bayesian approach, this leads to the binomial likelihood function. Using the binomial distribution convention - a correct prediction can be considered a success, while incorrect - a failure. In such a setting, the probability parameter of binomial distribution becomes accurate. Let's denote population-level accuracy as accaccacc, the number of successful (correct) predictions as sss and all the observations as nnn. The likelihood function can then be written as:

L(s,n∣acc=x)=(ns)xs(1−x)n−sL(s, n| acc=x) ={n \choose s}x^{s}(1-x)^{n-s}L(s,n∣acc=x)=(sn​)xs(1−x)n−s

Likelihood function assigns the likelihood of observing a data s,ns, ns,n given that the population-level accuracy is equal to xxx. We are interested in the opposite, that is what is the population-level accuracy distribution given the data we observed. We can use Bayes Theorem to get it:

P(acc=x∣s,n)=L(s,n∣acc=x)P(acc=x)P(s,n)P(acc=x |s, n) = {{L(s, n| acc=x)P(acc=x)}\over{P(s,n)}}P(acc=x∣s,n)=P(s,n)L(s,n∣acc=x)P(acc=x)​

The P(acc=x∣s,n)P(acc=x |s, n)P(acc=x∣s,n) term is the posterior probability distribution of accuracy given data. We already know the likelihood function. P(acc=x)P(acc=x)P(acc=x) is the prior - the belief that we have about the accuracy before we observed the data. P(s,n)P(s, n) P(s,n) is the normalizing constant which ensures that posterior is a proper probability - that is it integrates to 1.

Currently, NannyML uses a default uniform prior that assigns equal probability to each possible accuracy value in the range of 0-1. Such prior can be expressed with a Beta distribution with parameters αprior=1,βprior=1\alpha_{prior}=1, \beta_{prior}=1αprior​=1,βprior​=1. Since it happens to be a conjugate prior for the binomial likelihood function, we get a closed-form analytical solution for the posterior, which is another Beta distribution with parameters αposterior=αprior+s,\alpha_{posterior}=\alpha_{prior}+s, αposterior​=αprior​+s, βposterior=βprior+n−s\beta_{posterior}=\beta_{prior}+n-sβposterior​=βprior​+n−s. That also adds intuitive interpretation for the prior Beta parameters - they can be treated as pseudo counts (or just come from the posterior distribution of previous experiments if available).

Figure 1 shows how the posterior probability distribution of accuracy updates with more observed data. The true population-level value of accuracy is 0.5 as the data simulates a model that randomly assigns positive predictions to randomly generated positive targets, all at the probability of 0.5.

Precision and Recall

posteriors are estimated similarly to accuracy. For precision, the s parameter of the binomial likelihood function is the sum of true positive predictions (as this is the numerator of the precision score). At the same time, n becomes the number of positive predictions (since this is the denominator). For recall, s is the same as for the precision, but n is the count of positive targets.

F1

For F1, we cannot directly use the binomial likelihood function (and beta prior/posterior) because it does not fit the success-out-of-n-trials model. Sample F1 is calculated with the following formula:

F1=2tp2tp+fp+fn{\displaystyle F1={\frac {2\mathrm {tp} }{2\mathrm {tp} +\mathrm {fp} +\mathrm {fn} }}}F1=2tp+fp+fn2tp​

Three elements of the confusion matrix here (tp, fp, fn) are not independent. In that case, we model all confusion matrix elements at once. The likelihood function becomes a multinomial distribution with a probability vector containing four parameters - one for each confusion matrix element. Again, we apply a uniform prior and use conjugate distribution for multinomial likelihood - the Dirichlet distribution. As a posterior, we get another Dirichlet distribution with the following parameters:

α=[TP+α1,FP+α2,TN+α3,FN+α4]\alpha = [TP+\alpha_1, FP+\alpha_2, TN+\alpha_3, FN +\alpha_4] α=[TP+α1​,FP+α2​,TN+α3​,FN+α4​]

where α1=α2=α3=α4=1\alpha_1=\alpha_2=\alpha_3=\alpha_4=1 α1​=α2​=α3​=α4​=1 are prior pseudocounts, and TP,FP,TN,FNTP, FP,TN,FNTP,FP,TN,FN are confusion matrix elements. We then sample multiple times the population-level expected confusion matrix elements and calculate F1 as a deterministic variable.

ROC AUC

For ROC AUC, we take advantage of its being equal to the and estimate its posterior using the approach described .

Introduction
Figure 1. Updates of accuracy posterior.