Writing Functions for Regression
Writing the functions needed to create a custom regression metric.
As we have seen on the Introductory Custom Metric page the key components of a custom binary classification metric are the specific Python functions we need to provide for the custom metric to work. Here we will see how to create them.
We will assume the user has access to a Jupyter Notebook running Python with the NannyML open-source library installed.
Sample Dataset
We have created a sample dataset to facilitate developing the code needed for custom binary classification metrics. The dataset is publicly accessible here. It is a pure covariate shift dataset that consists of:
7 numerical features:
['feature1', 'feature2', 'feature3', 'feature4', 'feature5', 'feature6', 'feature7']
Target column:
y_true
Model prediction column:
y_pred
A timestamp column:
timestamp
An identifier column:
identifier
We can inspect the dataset with the following code in a Jupyter cell:
Developing custom regression metric functions
NannyML Cloud requires two functions for the custom metric to be used. The first is the loss
function used to calculate the instance level loss. The second is the aggregate
function used to aggregate over the instance level results. We are using this decomposition to be able to use the Direct Loss Estimation algorithm for performance estimation. In realized performance loss is calculated while in estimated performance loss is estimated.
Custom Functions API
The API of these functions is set by NannyML Cloud and is shown as a template on the New Custom Regression Metric screen.
Let's use the pinball metric as our custom regression metric. We start with creating the loss
function. Let's describe the data that are available to us to create our calculate function.
y_true
: Apandas.Series
python object containing the target column.y_pred
: Apandas.Series
python object containing the model predictions column.chunk_data:
Apandas.DataFrame
python object containing all columns associated with the model. This allows using other columns in the data provided for the calculation of the custom metric
Custom Pinball metric
We will create a custom metric from the pinball loss. Let's use an alpha
value of 0.9
. The loss
function would be:
The aggregate
function is quite simpler:
We can test those functions on the dataset loaded earlier. Assuming we run the functions as provided in a Jupyter cell we can then call them. Running them we get:
We can double check the result with sklearn
:
The results match as expected meaning we have correctly specified our custom metric.
Testing a Custom Metric in the Cloud product
We saw how to add a binary classification custom metric in the Custom Metrics Introductory page. We can further test it by using the dataset in the cloud product. The datasets are publicly available hence we can use the Public Link option when adding data to a new model.
Reference Dataset Public Link:
Monitored Dataset Public Link:
The process of creating a new model is described in the Monitoring a tabular data model.
Note that when we are on the Metrics page
we can go to Performance monitoring and directly add a custom metric we have already specified.
After the model has been added to NannyML Cloud and the first run has been completed we can inspect the monitoring results. Of particular interest to us is the comparison between estimated and realized performance for our custom metric.
We see that NannyML can accurately estimate our custom metric across the whole dataset. Even in the areas where there is a performance difference. This means that our loss
function is compatible with DLE and we can reliably use both performance calculation and performance estimation for our custom metric.
You may have noticed that for custom metrics we don't have a sampling error implementation. Therefore you will have to make a qualitative judgement, based on the results, whether the estimated and realized performance results are a good enough match or not.
Next Steps
You are now ready to use your new custom metric in production. However, you may want to make your implementation more robust to account for the data you will encounter in production. For example, you can add missing value handling to your implementation.