NannyML Cloud
HomeBlogNannyML OSS Docs
v0.22.0
v0.22.0
  • ☂️Introduction
  • Model Monitoring
    • Quickstart
    • Data Preparation
      • How to get data ready for NannyML
    • Tutorials
      • Monitoring a tabular data model
      • Monitoring with segmentation
      • Monitoring a text classification model
      • Monitoring a computer vision model
    • How it works
      • Probabilistic Adaptive Performance Estimation (PAPE)
      • Reverse Concept Drift (RCD)
  • Product tour
    • Navigation
    • Adding a model
    • Model overview
    • Deleting a model
    • Model side panel
      • Summary
      • Performance
      • Concept drift
      • Covariate shift
      • Data quality
      • Logs
      • Model settings
        • General
        • Data
        • Performance settings
        • Concept Drift settings
        • Covariate Shift settings
        • Descriptive Statistics settings
        • Data Quality settings
    • Account settings
  • Deployment
    • Azure
      • Azure Managed Application
        • Finding the URL to access managed NannyML Cloud
        • Enabling access to storage
      • Azure Software-as-a-Service (SaaS)
    • AWS
      • AMI with CFT
        • Architecture
      • EKS
        • Quick start cluster setup
      • S3 Access
    • Application setup
      • Authentication
      • Notifications
      • Webhooks
      • Permissions
  • NannyML Cloud SDK
    • Getting Started
    • API Reference
  • Probabilistic Model Evaluation
    • Introduction
    • Tutorials
      • Evaluating a binary classification model
      • Data Preparation
    • How it works
      • HDI+ROPE (with minimum precision)
      • Getting Probability Distribution of a Performance Metric with targets
      • Getting Probability Distribution of Performance Metric without targets
      • Getting Probability Distribution of Performance Metric when some observations have labels
      • Defaults for ROPE and estimation precision
  • Experiments Module
    • Introduction
    • Tutorials
      • Running an A/B test
      • Data Preparation
    • How it works
      • Getting probability distribution of the difference of binary downstream metrics
  • miscellaneous
    • Engineering
    • Usage logging in NannyNL
    • Versions
      • Version 0.22.0
      • Version 0.21.0
Powered by GitBook
On this page
  1. Probabilistic Model Evaluation
  2. How it works

Getting Probability Distribution of Performance Metric when some observations have labels

This page describes how NannyML estimates probability distribution of a performance metric when some observations have labels while other don't.

PreviousGetting Probability Distribution of Performance Metric without targetsNextDefaults for ROPE and estimation precision

In some cases experiment data can contain both: observations with labels and observations without labels. This happens for example when the predictions of the model affect whether we get to observe the label or not. For example, in credit scoring, we will never see the label for credit applicants who were rejected. This is an example of censored confusion matrix - specific elements of the confusion matrix are not available (in the credit scoring case we never observe true and false negatives).

For such cases we would still want to get single performance metric probability distribution that accounts for both: observations with and without labels. NannyML does that by separately calculating posteriors for these two types of observations and combining them into single posterior using Bayes theorem. Denoting p(m∣labeled,unlabeled)p(m|labeled, unlabeled)p(m∣labeled,unlabeled) a posterior of performance metric given labelled and unlabeled observations we write:

p(m∣labeled,unlabeled)∝p(labeled∣m)×p(m∣unlabeled)p(m|labeled, unlabeled) \propto p(labeled|m)\times p(m|unlabeled)p(m∣labeled,unlabeled)∝p(labeled∣m)×p(m∣unlabeled)

The p(m∣unlabeled)p(m|unlabeled) p(m∣unlabeled) term is the posterior . The p(labeled∣m)p(labeled|m) p(labeled∣m)is the likelihood of data with labels given the metric value of interest. We can plug it in directly (for metrics like accuracy this is the binomial function) or use the . This is possible, because for uniform priors p(m∣labeled)∝p(labeled∣m)p(m|labeled) \propto p(labeled|m)p(m∣labeled)∝p(labeled∣m).

we get from observations without labels
posterior for that we get for the data with labels