Getting Probability Distribution of Performance Metric when some observations have labels

This page describes how NannyML estimates probability distribution of a performance metric when some observations have labels while other don't.

In some cases experiment data can contain both: observations with labels and observations without labels. This happens for example when the predictions of the model affect whether we get to observe the label or not. For example, in credit scoring, we will never see the label for credit applicants who were rejected. This is an example of censored confusion matrix - specific elements of the confusion matrix are not available (in the credit scoring case we never observe true and false negatives).

For such cases we would still want to get single performance metric probability distribution that accounts for both: observations with and without labels. NannyML does that by separately calculating posteriors for these two types of observations and combining them into single posterior using Bayes theorem. Denoting p(mlabeled,unlabeled)p(m|labeled, unlabeled) a posterior of performance metric given labelled and unlabeled observations we write:

p(mlabeled,unlabeled)p(labeledm)×p(munlabeled)p(m|labeled, unlabeled) \propto p(labeled|m)\times p(m|unlabeled)

The p(munlabeled)p(m|unlabeled) term is the posterior we get from observations without labels. The p(labeledm)p(labeled|m) is the likelihood of data with labels given the metric value of interest. We can plug it in directly (for metrics like accuracy this is the binomial function) or use the posterior for that we get for the data with labels. This is possible, because for uniform priors p(mlabeled)p(labeledm)p(m|labeled) \propto p(labeled|m).

Last updated