Getting probability distribution of the difference of binary downstream metrics

This page describes how NannyML gets posterior distribution of a downstream metric that is binary.

When evaluating the effect of ML model predictions on downstream metrics in controlled experiments, we are interested in how the treatment (that is, the usage of the model) changes the metric of interest. In other words, what is the probability distribution of the difference between the values of the metric for the treatment group and the control group? If the downstream metric of interest is binary (can take one of two values, typically 0 and 1), we can model its posterior with Beta distribution. Assuming uniform priors, we get the posterior: β(α=s+1,β=ns+1)\beta(\alpha=s+1, \beta=n-s+1), where ss is the number of successes and nn is the total number of observations. For example, if we are investigating if a new model makes fewer people churn, then the definition of success as a parameter is the number of people that churned in the group (although churn is not a success from the business perspective). NannyML calculates posteriors for both groups separately and then samples from those posteriors and calculates the difference between the control and the treatment. In that way, we get the samples of the posterior of the difference (with the assumption of independence), which is exactly what we care about. This posterior can then be evaluated using the HDI+ROPE (with minimum precision) framework.