NannyML Cloud
HomeBlogNannyML OSS Docs
v0.24.0
v0.24.0
  • ☂️Introduction
  • Model Monitoring
    • Quickstart
    • Data Preparation
      • How to get data ready for NannyML
    • Tutorials
      • Monitoring a tabular data model
      • Monitoring with segmentation
      • Monitoring a text classification model
      • Monitoring a computer vision model
    • How it works
      • Probabilistic Adaptive Performance Estimation (PAPE)
      • Reverse Concept Drift (RCD)
    • Custom Metrics
      • Creating Custom Metrics
        • Writing Functions for Binary Classification
        • Writing Functions for Multiclass Classification
        • Writing Functions for Regression
        • Handling Missing Values
        • Advanced Tutorial: Creating a MTBF Custom Metric
      • Adding a Custom Metric through NannyML SDK
    • Reporting
      • Creating a new report
      • Report structure
      • Exporting a report
      • Managing reports
      • Report template
      • Add to report feature
  • Product tour
    • Navigation
    • Adding a model
    • Model overview
    • Deleting a model
    • Model side panel
      • Summary
      • Performance
      • Concept drift
      • Covariate shift
      • Data quality
      • Logs
      • Model settings
        • General
        • Data
        • Performance settings
        • Concept Drift settings
        • Covariate Shift settings
        • Descriptive Statistics settings
        • Data Quality settings
    • Account settings
  • Deployment
    • Azure
      • Azure Managed Application
        • Finding the URL to access managed NannyML Cloud
        • Enabling access to storage
      • Azure Software-as-a-Service (SaaS)
    • AWS
      • AMI with CFT
        • Architecture
      • EKS
        • Quick start cluster setup
      • S3 Access
    • Application setup
      • Authentication
      • Notifications
      • Webhooks
      • Permissions
  • NannyML Cloud SDK
    • Getting Started
    • API Reference
  • Probabilistic Model Evaluation
    • Introduction
    • Tutorials
      • Evaluating a binary classification model
      • Data Preparation
    • How it works
      • HDI+ROPE (with minimum precision)
      • Getting Probability Distribution of a Performance Metric with targets
      • Getting Probability Distribution of Performance Metric without targets
      • Getting Probability Distribution of Performance Metric when some observations have labels
      • Defaults for ROPE and estimation precision
  • Experiments Module
    • Introduction
    • Tutorials
      • Running an A/B test
      • Data Preparation
    • How it works
      • Getting probability distribution of the difference of binary downstream metrics
  • miscellaneous
    • Engineering
    • Usage logging in NannyNL
    • Versions
      • Version 0.24.0
      • Version 0.23.0
      • Version 0.22.0
      • Version 0.21.0
Powered by GitBook
On this page
  1. Experiments Module
  2. How it works

Getting probability distribution of the difference of binary downstream metrics

This page describes how NannyML gets posterior distribution of a downstream metric that is binary.

When evaluating the effect of ML model predictions on downstream metrics in controlled experiments, we are interested in how the treatment (that is, the usage of the model) changes the metric of interest. In other words, what is the probability distribution of the difference between the values of the metric for the treatment group and the control group? If the downstream metric of interest is binary (can take one of two values, typically 0 and 1), we can model its posterior with Beta distribution. Assuming uniform priors, we get the posterior: β(α=s+1,β=n−s+1)\beta(\alpha=s+1, \beta=n-s+1)β(α=s+1,β=n−s+1), where sss is the number of successes and nnn is the total number of observations. For example, if we are investigating if a new model makes fewer people churn, then the definition of success as a parameter is the number of people that churned in the group (although churn is not a success from the business perspective). NannyML calculates posteriors for both groups separately and then samples from those posteriors and calculates the difference between the control and the treatment. In that way, we get the samples of the posterior of the difference (with the assumption of independence), which is exactly what we care about. This posterior can then be evaluated using the HDI+ROPE (with minimum precision) framework.

PreviousHow it worksNextEngineering