Getting Probability Distribution of a Performance Metric with targets

This page describes how NannyML estimates probability distribution of a performance metric when the targets are available.

As described in the Introduction, Probabilistic Model Evaluation uses performance metric probability distribution estimated with the Bayesian approach. When the experiment data has targets, the task is relatively straightforward. The implementation details depend on the performance metric. Here, we will show how it is done for selected metrics.

Accuracy Score

Figure 1 shows how the posterior probability distribution of accuracy updates with more observed data. The true population-level value of accuracy is 0.5 as the data simulates a model that randomly assigns positive predictions to randomly generated positive targets, all at the probability of 0.5.

Precision and Recall

posteriors are estimated similarly to accuracy. For precision, the s parameter of the binomial likelihood function is the sum of true positive predictions (as this is the numerator of the precision score). At the same time, n becomes the number of positive predictions (since this is the denominator). For recall, s is the same as for the precision, but n is the count of positive targets.


For F1, we cannot directly use the binomial likelihood function (and beta prior/posterior) because it does not fit the success-out-of-n-trials model. Sample F1 is calculated with the following formula:

Three elements of the confusion matrix here (tp, fp, fn) are not independent. In that case, we model all confusion matrix elements at once. The likelihood function becomes a multinomial distribution with a probability vector containing four parameters - one for each confusion matrix element. Again, we apply a uniform prior and use conjugate distribution for multinomial likelihood - the Dirichlet distribution. As a posterior, we get another Dirichlet distribution with the following parameters:


For ROC AUC, we take advantage of its being equal to the and estimate its posterior using the approach described .

Last updated