Reverse Concept Drift (RCD)

How to use the Reverse Concept Drift Algorithm

Reverse Concept Drift (RCD) belongs to the NannyML Cloud family of algorithms and can be accessed through a NannyML Cloud license or as a standalone algorithm.

NannyML Cloud - Azure Marketplace
NannyML Cloud - AWS Instance
RCD Algorithm - AWS Sagemaker Marketplace

How Reverse Concept Drift Works

Concept Drift

We call concept the relationship between the model inputs and the model targets. This is what a machine learning model learns when we train it. When this relationship changes, we say we have concept drift. Mathematically, we can express a concept as $\mathrm{P}(y|X)$ .

Concept drift and covariate shift may occur separately or together. Covariate shift is a change in the distributions of model inputs, $\mathrm{P}(X)$ . One does not exclude the presence of the other. They both affect the performance of a model. Not only that but their effect on performance is coupled. We describe that in more detail in our Understanding Data Shift: Impact on Machine Learning Model Performance blog post.

The Reverse Concept Drift (RCD) algorithm focuses on the concept drift's impact on the model's performance. This is to keep the method simpler and to provide results that are easier to interpret. For the impact of all factors in the model's performance, we need to look no further than the actual realized performance.

Intuition

When we have concept drift, we know there is a new concept in the monitored data compared to what we have in our reference data. We can train a new machine learning model and learn the new concept in order to compare it with the existing one. But how do we make a meaningful comparison?

As mentioned, performance change is a combination of covariate shift and concept drift. We can factor out covariate shift impact, as well as its interaction with concept drift, by focusing on the reference dataset. How can we do that? We use the concept we learnt on the monitored data to make predictions on the reference dataset and treat them as ground truth. This allows us to estimate how the monitored model would perform under the monitored data's concept on the reference data.

Implementation

The impact of concept drift on performance is calculated based on the following steps:

Train a LightGBM model on a chunk of the monitored data.
Use the learned concept, to make predictions on the reference dataset.
Estimate Model Performance on reference data assuming the monitored concept model's predictions are the ground truth. A key detail here is that we are using the predicted scores, not the predicted labels. This allows us to have a more accurate calculation but adds more complexity. The calculation uses CBPE in an inverse way, where the fractional results come from the y_true column rather than the y_pred_proba column.
The actual model's performance on reference is subtracted from the estimated performance result. This results in a performance number that is the performance impact on the model only because of the concept drift. To compare, the full performance impact under both concept drift and covariate shift is the performance change between the performance of the model in the chunk data minus its performance on the reference data. This is why those results are also labeled with the Performance Impact Estimation (PIE) acronym.

However, Reverse Concept Drift (RCD) also offers another approach. It substitutes steps 3 and 4 with the following step:

Calculate the mean of the absolute difference between the model concept and the chunk concept for all reference data points. The resulting number is reported as Magnitude Estimation (ME). Understanding magnitude estimation is easier by looking at the integral it represents: $\int_{ref} \big| \mathrm{P}(y|X)_{monitored} - \mathrm{P}(y|X)_{ref} \big| \mathrm{P}(X)_{ref} \mathrm{d}X$ This is a pure concept drift effect on the model's performance on the reference data. Its values range between 0 and 1, and this is why we call it a magnitude. It is not, however, a (concept drift) distance metric as it doesn't satisfy all the properties a distance metric needs.

Assumptions and Limitations

RCD rests on some assumptions in order to give accurate results. Those assumptions are also its limitations. Let's see what they are:

1. The data available are large enough.

We need enough data to be able to accurately train a density ratio estimation model and be able to properly multi-calibrate.

RCD will likely fail if there is covariate shift to previously unseen regions in the model input space. Mathematically, we can say that the support of the analysis data needs to be a subset of the support of the reference data. If not, density ratio estimation is theoretically not defined. Practically if we don't have data from an analysis region in the reference data, we can't account for that shift with a weighted calculation from reference data.

2. Our machine learning algorithm is able to capture the concept accurately.

We are using a LightGBM model to learn the concept. In cases where that algorithm cannot perform well enough, RCD will not provide accurate results.

PreviousProbabilistic Adaptive Performance Estimation (PAPE)NextCustom Metrics