Adding a Custom Metric through NannyML SDK

Adding Custom Metrics programmatically through NannML SDK

The SDK Custom Metrics is part of the monitoring class and it can be created by instantiating a new nml_sdk.monitoring.CustomMetric(). Before all, you will need to set up the NannyML SDK address and your token. You can see how to do this on the Cloud SDK Getting Started page.

import nannyml_cloud_sdk as nml_sdk

## First, authenticate to NannyML cloud
nml_sdk.url = "https://beta.app.nannyml.com"
nml_sdk.api_token = r"api token goes here"

## Then create a new custom metric instance
custom_metric = nml_sdk.monitoring.CustomMetric()

The class CustomMetric allows the user to perform 4 main actions:

  • Create a new custom metric

  • List all the existing custom metrics

  • Get the details of a custom metric

  • Delete a custom metric

Create a custom metric

The custom metric function

To create a new custom metric you need to provide a python function or a string containing a python function that receives a list of named arguments and returns a value. This python function needs to be named either calculate, estimate, aggregate or loss depending on the problem type this function serves.

For example, the calculate function, as seen below, is used by Binary Classification or Multiclass Classification.

Passing a function

import pandas as pd

def calculate(y_true: pd.Series,
    y_pred: pd.Series,
    y_pred_proba: pd.Series,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
    **kwargs
) -> float:
    # Perform the metric calculate here
    # Return a float value
    return 1

Passing a string

calculate = """
import pandas as pd

def calculate(y_true: pd.Series,
    y_pred: pd.Series,
    y_pred_proba: pd.Series,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
    **kwargs
) -> float:
    # Perform the metric calculate here
    # Return a float value
    return 1
"""

The calculate function is expected to receive a set of arguments and return a float value. This means that when this function is called a list of arguments will be passed to the function and a float value is returned. Not providing a function that accepts the correct arguments and returns the float value will generate errors during the execution.

For the calculate function, some of the arguments being passed require the pandas library to be imported.

Creating the custom metric

After defining a function, you can call custom_metric.create to register your custom function as a custom metric. Here the custom metric is being created as a Binary Classification and passing just the calculate function.

custom_metric.create(calculation_function=calculate,
                     name='example_metric',
                     description='Example of a binary classification custom metric.', 
                     problem_type='BINARY_CLASSIFICATION')

If the provided function for the custom metric is not a valid python function, i.e. there is a syntax error in your function, an error will be thrown:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nannyml-cloud-sdk/nannyml_cloud_sdk/monitoring/custom_metric.py", line 210, in create
    return execute(_CREATE_CUSTOM_METRIC, {
  File "/nannyml-cloud-sdk/nannyml_cloud_sdk/client.py", line 58, in wrapper
    raise ApiError(ex.errors[0]['message']) from ex
nannyml_cloud_sdk.errors.ApiError: Provided calculate function is not valid Python code

Custom metric problem types

If you need a custom metric for Binary Classification or Multiclass Classification you need to provide the calculate function and you can also pass an optional estimate function. The estimate function has the following structure:

estimate_function = """
```
import pandas as pd

def estimate(
    estimated_target_probabilities: pd.Series,
    y_pred: pd.Series,
    y_pred_proba: pd.Series,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
    **kwargs
) -> float:
    # Perform the metric calculate here
    # Return a float value
    return 1
```
"""

The import statements needs to be repeated on every custom function defined.

The custom_metric.create function for the Binary Classification or Multiclass Classification is called like this:

custom_metric.create(
    name="custom_metric_name", # string containing the custom metric unique name
    description="custom metric description", # The custom metric description
    problem_type="BINARY_CLASSIFICATION", # BINARY_CLASSIFICATION or MULTICLASS_CLASSIFICATION
    calculation_function=calculate_function, # String containing a valid python code for the calculate function
    estimation_function=estimate_function, # Optional string containing a valid python code for the estimate function
    lower_value_limit=0.0, # Optional float value for lower limit
    upper_value_limit=1.0, # Optional float value for upper limit
)

If you need a custom metric for a Regression problem type, instead of an estimate or calculate function you need to provide an aggregate and a loss function.

The Aggregate function structure

aggregate_function = """
import numpy as np
import pandas as pd

def aggregate(loss: np.ndarray, chunk_data: pd.DataFrame) -> float:
    pass
"""

The Loss function structure:

loss_function = """
import numpy as np
import pandas as pd

def loss(y_true: pd.Series, y_pred: pd.Series, chunk_data: pd.DataFrame) -> np.ndarray:
    pass
"""

Then creating the function would be like:

custom_metric.create(
        name="custom_metric_name", # string containing the custom metric unique name
        description="custom metric description", # The custom metric description
        problem_type="REGRESSION", # REGRESSION is the only allowed value for this case
        loss_function=loss_function,
        aggregation_function=aggregate_function,
        lower_value_limit=0.0, # Optional float value for lower limit if it exists
        upper_value_limit=1.0, # Optional float value for upper limit if it exists
    )

Assign custom metric to a model

A new custom metric, when created, is not assigned to any model. To assign this custom metric to a model, first you will need to retrieve the unique identifier of a model (model_id) and then call monitoring.Model.add_custom_metric passing the model_id and the metric_id as parameters.

Retrieving the model_id can be done by calling nml_sdk.monitoring.Model.list(). This function either lists all the models available of filter the models by name or problem type. The return will be an array of dictionaries, the model_id will be the value of the id key of a dictionary.

## Searching for models with problem type equals to Binary classification
print(nml_sdk.monitoring.Model.list(problemType='BINARY_CLASSIFICATION')) 
>>> [{'name': 'Model1', 'id': 1, 'problemType': 'BINARY_CLASSIFICATION', 'createdAt': datetime.datetime(2024, 8, 19, 9, 14, 59, 678112, tzinfo=datetime.timezone.utc)}]
## The model_id here is represented by 'id':1

## Creating a custom metric using the previusly defined calculate and estimate function
custom_metric = nml_sdk.monitoring.CustomMetric()

cm = custom_metric.create(
        name="new_custom_metric",
        description="custom metric description",
        problem_type="BINARY_CLASSIFICATION",
        calculation_function=calculate_function,
        estimation_function=estimate_function,
        lower_value_limit=0.0, 
        upper_value_limit=1.0, 
    )

print(cm)
>>> {'name': 'new_custom_metric', 'id': 1, 'problemType': 'BINARY_CLASSIFICATION', 'createdAt': datetime.datetime(2024, 8, 27, 12, 6, 17, 42718, tzinfo=datetime.timezone.utc), 'description': '', 'calculateFn': '', 'estimateFn': ''}

## Add the new custom metric to the existing model
nml_sdk.monitoring.Model.add_custom_metric(model_id=1, metric_id=cm['id'])

From now on, every time you run your model "Model1", the new custom metric will be calculated among the standard metrics.

A custom metric can be assigned to any existing model as long as their problem types matches.

Listing the custom metrics

It is possible to list the existing custom metrics filtering by name and problem type:

custom_metric.list(
    problem_type='BINARY_CLASSIFICATION'
)

>>> [
        {
            'name': 'custom_metric', 
            'id': 1, 
            'problemType': 'BINARY_CLASSIFICATION', 
            'createdAt': datetime.datetime(2024, 8, 27, 11, 36, 41, 512759, tzinfo=datetime.timezone.utc)
        }, 
        {
            'name': 'custom_metric2', 
            'id': 2, 
            'problemType': 'BINARY_CLASSIFICATION', 
            'createdAt': datetime.datetime(2024, 8, 27, 12, 6, 17, 42718, tzinfo=datetime.timezone.utc)
        }
    ]
    
custom_metric.list(
    name='custom_metric2'
)

>>> [
        {
            'name': 'custom_metric2', 
            'id': 2, 
            'problemType': 'BINARY_CLASSIFICATION', 
            'createdAt': datetime.datetime(2024, 8, 27, 12, 6, 17, 42718, tzinfo=datetime.timezone.utc)
        }
    ]

Listing a custom metric doesn't expose the functions implementation. To check the code inside the custom metrics you need to run the custom_metric.get function.

Getting custom metrics details

custom_metric.get(1)
>>> {
        'name': 'custom_metric', 
        'id': 1, 
        'problemType': 'BINARY_CLASSIFICATION', '
        createdAt': datetime.datetime(2024, 8, 27, 11, 36, 41, 512759, tzinfo=datetime.timezone.utc), 
        'description': '', 
        'calculateFn': '\ndef calculate(**kwargs):\n    return 1\n', 
        'estimateFn': None
}

The function implementations here are converted to a raw one-line string. Line breaks are replaced by '\n' and possibly tabs will be replaced by '\t' if your test editor does not use spaces.

Removing a custom metric from a model

Just like it is possible to assign a custom metric to a model, you can also remove a custom metrics from a model:

nml_sdk.monitoring.Model.remove_custom_metric(model_id=1,metric_id=1)

After removing a custom metric from a model, this custom metric will not be calculated anymore when running this model and its previous results will not be shown anymore.

Deleting a custom metric

If a custom metric is not necessary anymore, it is possible to delete it.

custom_metric.delete(metric_id=1)

SDK custom metrics end-to-end examples

Binary classification

This is an example of a custom F_2 function. Note that it is possible to add external libraries to the custom code. In this example, the fbeta_score from sklearn.metrics will be used. For more context on custom metrics for binary classification, you can refer to the tutorial Writing functions for Binary Classification where the concept of the calculate and estimate functions are better defined.

import nannyml_cloud_sdk as nml_sdk

## First, authenticate to NannyML cloud
nml_sdk.url = "https://beta.app.nannyml.com"
nml_sdk.api_token = r"api token goes here"

import pandas as pd
import numpy as np
from sklearn.metrics import fbeta_score

def calculate(
    y_true: pd.Series,
    y_pred: pd.Series,
    y_pred_proba: pd.DataFrame,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
) -> float:
    # labels and class_probability_columns are only needed for multiclass classification
    # and can be ignored for binary classification custom metrics
    return fbeta_score(y_true, y_pred, beta=2)

def estimate(
    estimated_target_probabilities: pd.DataFrame,
    y_pred: pd.Series,
    y_pred_proba: pd.DataFrame,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
) -> float:
    # labels and class_probability_columns are only needed for multiclass classification
    # and can be ignored for binary classification custom metrics

    estimated_target_probabilities = estimated_target_probabilities.to_numpy().ravel()
    y_pred = y_pred.to_numpy()

    # Create estimated confusion matrix elements
    est_tp = np.sum(np.where(y_pred == 1, estimated_target_probabilities, 0))
    est_fp = np.sum(np.where(y_pred == 1, 1 - estimated_target_probabilities, 0))
    est_fn = np.sum(np.where(y_pred == 0, estimated_target_probabilities, 0))
    est_tn = np.sum(np.where(y_pred == 0, 1 - estimated_target_probabilities, 0))

    beta = 2
    fbeta =  (1 + beta**2) * est_tp / ( (1 + beta**2) * est_tp + est_fp + beta**2 * est_fn)
    fbeta = np.nan_to_num(fbeta)
    return fbeta


## Create an instance of the custom metric module
custom_metric = nml_sdk.monitoring.CustomMetric()

cm = custom_metric.create(
        name="custom_F_2", 
        description="Custom implementation for F_2",
        problem_type="BINARY_CLASSIFICATION",
        calculation_function=calculate,
        estimation_function=estimate,
        lower_value_limit=0.0, 
        upper_value_limit=1.0, 
    )
    
## We will add this custom metric to the existing model, model_id = 1
nml_sdk.monitoring.Model.add_custom_metric(model_id=1, metric_id=cm['id'])

# Trigger analysis of the new data
nml_sdk.monitoring.Run.trigger(model_id=1)

Multiclass Classification

This is an example of a custom F_2 Multiclass classification function. For more context on custom metrics for binary classification you can refer to the tutorial Writing Function for Multiclass Classification where the concept of the calculate and estimate functions are better defined.

import nannyml_cloud_sdk as nml_sdk

## First, authenticate to NannyML cloud
nml_sdk.url = "https://beta.app.nannyml.com"
nml_sdk.api_token = r"api token goes here"

import pandas as pd
import numpy as np
from sklearn.metrics import fbeta_score
from sklearn.preprocessing import label_binarize

def calculate(
    y_true: pd.Series,
    y_pred: pd.Series,
    y_pred_proba: pd.DataFrame,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
    **kwargs
) -> float:
    return fbeta_score(y_true, y_pred, beta=2, average='macro')

def estimate(
    estimated_target_probabilities: pd.DataFrame,
    y_pred: pd.Series,
    y_pred_proba: pd.DataFrame,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
    **kwargs
):
    beta = 2

    def estimate_fb(_y_pred, _y_pred_proba, beta) -> float:
        # Estimates the Fb metric.
        #
        # Parameters
        # ----------
        # y_pred: np.ndarray
        #     Predicted class label of the sample
        # y_pred_proba: np.ndarray
        #     Probability estimates of the sample for each class in the model.
        # beta: float
        #     beta parameter
        #
        # Returns
        # -------
        # metric: float
        #     Estimated Fb score.
        

        est_tp = np.sum(np.where(_y_pred == 1, _y_pred_proba, 0))
        est_fp = np.sum(np.where(_y_pred == 1, 1 - _y_pred_proba, 0))
        est_fn = np.sum(np.where(_y_pred == 0, _y_pred_proba, 0))
        est_tn = np.sum(np.where(_y_pred == 0, 1 - _y_pred_proba, 0))

        fbeta =  (1 + beta**2) * est_tp / ( (1 + beta**2) * est_tp + est_fp + beta**2 * est_fn)
        fbeta = np.nan_to_num(fbeta)
        return fbeta

    estimated_target_probabilities = estimated_target_probabilities.to_numpy()
    y_preds = label_binarize(y_pred, classes=labels)

    ovr_estimates = []
    for idx, _  in enumerate(labels):
        ovr_estimates.append(
            estimate_fb(
                y_preds[:, idx],
                estimated_target_probabilities[:, idx],
                beta=2
            )
        )
    multiclass_metric = np.mean(ovr_estimates)

    return multiclass_metric


## Create an instance of the custom metric module
custom_metric = nml_sdk.monitoring.CustomMetric()

cm = custom_metric.create(
        name="custom_F_2", 
        description="Custom implementation for F_2",
        problem_type="MULTICLASS_CLASSIFICATION",
        calculation_function=calculate,
        estimation_function=estimate,
        lower_value_limit=0.0, 
        upper_value_limit=1.0, 
    )
    
## We will add this custom metric to the existing model, model_id = 1
nml_sdk.monitoring.Model.add_custom_metric(model_id=1, metric_id=cm['id'])

# Trigger analysis of the new data
nml_sdk.monitoring.Run.trigger(model_id=1)

Regression

To define a Regression custom metric, you need to set up a loss function and an aggregate function. Those functions are needed to calculate realized performance and estimate performance. Please refer to the document Writing Functions for Regression where the concept of the loss and aggregate functions are better defined.

import nannyml_cloud_sdk as nml_sdk

## First, authenticate to NannyML cloud
nml_sdk.url = "https://beta.app.nannyml.com"
nml_sdk.api_token = r"api token goes here"

import numpy as np
import pandas as pd

def loss(
    y_true: pd.Series,
    y_pred: pd.Series,
    chunk_data: pd.DataFrame,
    **kwargs
) -> np.ndarray:
    y_true = y_true.to_numpy()
    y_pred = y_pred.to_numpy()

    alpha = 0.9
    factor1 = alpha * np.maximum(y_true - y_pred, 0)
    factor2 = (1 - alpha) * np.maximum(y_pred - y_true, 0)
    return factor1 + factor2

def aggregate(
    loss: np.ndarray,
    chunk_data: pd.DataFrame,
    **kwargs
) -> float:
    return loss.mean()

## Create an instance of the custom metric module
custom_metric = nml_sdk.monitoring.CustomMetric()

cm = custom_metric.create(
        name="custom_alpha_loss", 
        description="Custom implementation for Direct Loss Estimation",
        problem_type="REGRESSION",
        loss_function=loss,
        aggregation_function=aggregate,
        lower_value_limit=0.0,
        upper_value_limit=None, 
    )
    
## We will add this custom metric to the existing model, model_id = 1
nml_sdk.monitoring.Model.add_custom_metric(model_id=1, metric_id=cm['id'])

# Trigger analysis of the new data
nml_sdk.monitoring.Run.trigger(model_id=1)