> For the complete documentation index, see [llms.txt](https://docs.nannyml.com/cloud/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.nannyml.com/cloud/v0.24.2/model-monitoring/custom-metrics/adding-a-custom-metric-through-nannyml-sdk.md).

# Adding a Custom Metric through NannyML SDK

The SDK Custom Metrics is part of the monitoring class and it can be created by instantiating a new `nml_sdk.monitoring.CustomMetric()`. Before all, you will need to set up the NannyML SDK address and your token. You can see how to do this on the [Cloud SDK Getting Started page](/cloud/v0.24.2/nannyml-cloud-sdk/getting-started.md).

```python
import nannyml_cloud_sdk as nml_sdk

## First, authenticate to NannyML cloud
nml_sdk.url = "https://beta.app.nannyml.com"
nml_sdk.api_token = r"api token goes here"

## Then create a new custom metric instance
custom_metric = nml_sdk.monitoring.CustomMetric()
```

The class `CustomMetric` allows the user to perform 4 main actions:

* Create a new custom metric
* List all the existing custom metrics
* Get the details of a custom metric
* Delete a custom metric

## Create a custom metric

### The custom metric function

To create a new custom metric you need to provide a python function or a string containing a python function that receives a list of named arguments and returns a value. This python function needs to be named either `calculate`, `estimate`, `aggregate` or `loss` depending on the problem type this function serves.

For example, the `calculate` function, as seen below, is used by Binary Classification or Multiclass Classification.

#### Passing a function

```python
import pandas as pd

def calculate(y_true: pd.Series,
    y_pred: pd.Series,
    y_pred_proba: pd.Series,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
    **kwargs
) -> float:
    # Perform the metric calculate here
    # Return a float value
    return 1

```

#### Passing a string

```python
calculate = """
import pandas as pd

def calculate(y_true: pd.Series,
    y_pred: pd.Series,
    y_pred_proba: pd.Series,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
    **kwargs
) -> float:
    # Perform the metric calculate here
    # Return a float value
    return 1
"""
```

The calculate function is expected to receive a set of arguments and return a float value. This means that when this function is called a list of arguments will be passed to the function and a float value is returned. Not providing a function that accepts the correct arguments and returns the float value will generate errors during the execution.

For the calculate function, some of the arguments being passed require the pandas library to be imported.

### Creating the custom metric

After defining a function, you can call `custom_metric.create` to register your custom function as a custom metric. Here the custom metric is being created as a Binary Classification and passing just the calculate function.

```python
custom_metric.create(calculation_function=calculate,
                     name='example_metric',
                     description='Example of a binary classification custom metric.', 
                     problem_type='BINARY_CLASSIFICATION')
```

If the provided function for the custom metric is not a valid python function, i.e. there is a syntax error in your function, an error will be thrown:

```python
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nannyml-cloud-sdk/nannyml_cloud_sdk/monitoring/custom_metric.py", line 210, in create
    return execute(_CREATE_CUSTOM_METRIC, {
  File "/nannyml-cloud-sdk/nannyml_cloud_sdk/client.py", line 58, in wrapper
    raise ApiError(ex.errors[0]['message']) from ex
nannyml_cloud_sdk.errors.ApiError: Provided calculate function is not valid Python code
```

#### Custom metric problem types

If you need a custom metric for Binary Classification or Multiclass Classification you need to provide the calculate function and you can also pass an optional estimate function. The estimate function has the following structure:

````python
estimate_function = """
```
import pandas as pd

def estimate(
    estimated_target_probabilities: pd.Series,
    y_pred: pd.Series,
    y_pred_proba: pd.Series,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
    **kwargs
) -> float:
    # Perform the metric calculate here
    # Return a float value
    return 1
```
"""
````

The import statements needs to be repeated on every custom function defined.

The `custom_metric.create` function for the Binary Classification or Multiclass Classification is called like this:

<pre class="language-python"><code class="lang-python"><strong>custom_metric.create(
</strong><strong>    name="custom_metric_name", # string containing the custom metric unique name
</strong>    description="custom metric description", # The custom metric description
    problem_type="BINARY_CLASSIFICATION", # BINARY_CLASSIFICATION or MULTICLASS_CLASSIFICATION
    calculation_function=calculate_function, # String containing a valid python code for the calculate function
    estimation_function=estimate_function, # Optional string containing a valid python code for the estimate function
    lower_value_limit=0.0, # Optional float value for lower limit
    upper_value_limit=1.0, # Optional float value for upper limit
)
</code></pre>

If you need a custom metric for a Regression problem type, instead of an estimate or calculate function you need to provide an aggregate and a loss function.

The Aggregate function structure

```python
aggregate_function = """
import numpy as np
import pandas as pd

def aggregate(loss: np.ndarray, chunk_data: pd.DataFrame) -> float:
    pass
"""
```

The Loss function structure:

```python
loss_function = """
import numpy as np
import pandas as pd

def loss(y_true: pd.Series, y_pred: pd.Series, chunk_data: pd.DataFrame) -> np.ndarray:
    pass
"""
```

Then creating the function would be like:

```python
custom_metric.create(
        name="custom_metric_name", # string containing the custom metric unique name
        description="custom metric description", # The custom metric description
        problem_type="REGRESSION", # REGRESSION is the only allowed value for this case
        loss_function=loss_function,
        aggregation_function=aggregate_function,
        lower_value_limit=0.0, # Optional float value for lower limit if it exists
        upper_value_limit=1.0, # Optional float value for upper limit if it exists
    )

```

## Assign custom metric to a model

A new custom metric, when created, is not assigned to any model. To assign this custom metric to a model, first you will need to retrieve the unique identifier of a model (`model_id`) and then call `monitoring.Model.add_custom_metric` passing the `model_id` and the `metric_id` as parameters.

Retrieving the model\_id can be done by calling `nml_sdk.monitoring.Model.list()`. This function either lists all the models available of filter the models by name or problem type. The return will be an array of dictionaries, the `model_id` will be the value of the `id` key of a dictionary.&#x20;

```python
## Searching for models with problem type equals to Binary classification
print(nml_sdk.monitoring.Model.list(problemType='BINARY_CLASSIFICATION')) 
>>> [{'name': 'Model1', 'id': 1, 'problemType': 'BINARY_CLASSIFICATION', 'createdAt': datetime.datetime(2024, 8, 19, 9, 14, 59, 678112, tzinfo=datetime.timezone.utc)}]
## The model_id here is represented by 'id':1

## Creating a custom metric using the previusly defined calculate and estimate function
custom_metric = nml_sdk.monitoring.CustomMetric()

cm = custom_metric.create(
        name="new_custom_metric",
        description="custom metric description",
        problem_type="BINARY_CLASSIFICATION",
        calculation_function=calculate_function,
        estimation_function=estimate_function,
        lower_value_limit=0.0, 
        upper_value_limit=1.0, 
    )

print(cm)
>>> {'name': 'new_custom_metric', 'id': 1, 'problemType': 'BINARY_CLASSIFICATION', 'createdAt': datetime.datetime(2024, 8, 27, 12, 6, 17, 42718, tzinfo=datetime.timezone.utc), 'description': '', 'calculateFn': '', 'estimateFn': ''}

## Add the new custom metric to the existing model
nml_sdk.monitoring.Model.add_custom_metric(model_id=1, metric_id=cm['id'])
```

From now on, every time you run your model "Model1", the new custom metric will be calculated among the standard metrics.

A custom metric can be assigned to any existing model as long as their problem types matches.

## Listing the custom metrics

It is possible to list the existing custom metrics filtering by name and problem type:

```python
custom_metric.list(
    problem_type='BINARY_CLASSIFICATION'
)

>>> [
        {
            'name': 'custom_metric', 
            'id': 1, 
            'problemType': 'BINARY_CLASSIFICATION', 
            'createdAt': datetime.datetime(2024, 8, 27, 11, 36, 41, 512759, tzinfo=datetime.timezone.utc)
        }, 
        {
            'name': 'custom_metric2', 
            'id': 2, 
            'problemType': 'BINARY_CLASSIFICATION', 
            'createdAt': datetime.datetime(2024, 8, 27, 12, 6, 17, 42718, tzinfo=datetime.timezone.utc)
        }
    ]
    
custom_metric.list(
    name='custom_metric2'
)

>>> [
        {
            'name': 'custom_metric2', 
            'id': 2, 
            'problemType': 'BINARY_CLASSIFICATION', 
            'createdAt': datetime.datetime(2024, 8, 27, 12, 6, 17, 42718, tzinfo=datetime.timezone.utc)
        }
    ]
```

Listing a custom metric doesn't expose the functions implementation. To check the code inside the custom metrics you need to run the `custom_metric.get` function.

## Getting custom metrics details

```python
custom_metric.get(1)
>>> {
        'name': 'custom_metric', 
        'id': 1, 
        'problemType': 'BINARY_CLASSIFICATION', '
        createdAt': datetime.datetime(2024, 8, 27, 11, 36, 41, 512759, tzinfo=datetime.timezone.utc), 
        'description': '', 
        'calculateFn': '\ndef calculate(**kwargs):\n    return 1\n', 
        'estimateFn': None
}
```

The function implementations here are converted to a raw one-line string. Line breaks are replaced by '\n' and possibly tabs will be replaced by '\t' if your test editor does not use spaces.

## Removing a custom metric from a model

Just like it is possible to assign a custom metric to a model, you can also remove a custom metrics from a model:

```python
nml_sdk.monitoring.Model.remove_custom_metric(model_id=1,metric_id=1)
```

After removing a custom metric from a model, this custom metric will not be calculated anymore when running this model and its previous results will not be shown anymore.

## Deleting a custom metric

If a custom metric is not necessary anymore, it is possible to delete it.

```python
custom_metric.delete(metric_id=1)
```

## SDK custom metrics end-to-end examples

### Binary classification

This is an example of a custom F\_2 function. Note that it is possible to add external libraries to the custom code. In this example, the fbeta\_score from sklearn.metrics will be used. For more context on custom metrics for binary classification, you can refer to the tutorial [Writing functions for Binary Classification](/cloud/v0.24.2/model-monitoring/custom-metrics/creating-custom-metrics/writing-functions-for-binary-classification.md) where the concept of the calculate and estimate functions are better defined.

```python
import nannyml_cloud_sdk as nml_sdk

## First, authenticate to NannyML cloud
nml_sdk.url = "https://beta.app.nannyml.com"
nml_sdk.api_token = r"api token goes here"

import pandas as pd
import numpy as np
from sklearn.metrics import fbeta_score

def calculate(
    y_true: pd.Series,
    y_pred: pd.Series,
    y_pred_proba: pd.DataFrame,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
) -> float:
    # labels and class_probability_columns are only needed for multiclass classification
    # and can be ignored for binary classification custom metrics
    return fbeta_score(y_true, y_pred, beta=2)

def estimate(
    estimated_target_probabilities: pd.DataFrame,
    y_pred: pd.Series,
    y_pred_proba: pd.DataFrame,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
) -> float:
    # labels and class_probability_columns are only needed for multiclass classification
    # and can be ignored for binary classification custom metrics

    estimated_target_probabilities = estimated_target_probabilities.to_numpy().ravel()
    y_pred = y_pred.to_numpy()

    # Create estimated confusion matrix elements
    est_tp = np.sum(np.where(y_pred == 1, estimated_target_probabilities, 0))
    est_fp = np.sum(np.where(y_pred == 1, 1 - estimated_target_probabilities, 0))
    est_fn = np.sum(np.where(y_pred == 0, estimated_target_probabilities, 0))
    est_tn = np.sum(np.where(y_pred == 0, 1 - estimated_target_probabilities, 0))

    beta = 2
    fbeta =  (1 + beta**2) * est_tp / ( (1 + beta**2) * est_tp + est_fp + beta**2 * est_fn)
    fbeta = np.nan_to_num(fbeta)
    return fbeta


## Create an instance of the custom metric module
custom_metric = nml_sdk.monitoring.CustomMetric()

cm = custom_metric.create(
        name="custom_F_2", 
        description="Custom implementation for F_2",
        problem_type="BINARY_CLASSIFICATION",
        calculation_function=calculate,
        estimation_function=estimate,
        lower_value_limit=0.0, 
        upper_value_limit=1.0, 
    )
    
## We will add this custom metric to the existing model, model_id = 1
nml_sdk.monitoring.Model.add_custom_metric(model_id=1, metric_id=cm['id'])

# Trigger analysis of the new data
nml_sdk.monitoring.Run.trigger(model_id=1)
```

### Multiclass Classification

This is an example of a custom F\_2 Multiclass classification function. For more context on custom metrics for binary classification you can refer to the tutorial [Writing Function for Multiclass Classification](/cloud/v0.24.2/model-monitoring/custom-metrics/creating-custom-metrics/writing-functions-for-multiclass-classification.md) where the concept of the calculate and estimate functions are better defined.

```python
import nannyml_cloud_sdk as nml_sdk

## First, authenticate to NannyML cloud
nml_sdk.url = "https://beta.app.nannyml.com"
nml_sdk.api_token = r"api token goes here"

import pandas as pd
import numpy as np
from sklearn.metrics import fbeta_score
from sklearn.preprocessing import label_binarize

def calculate(
    y_true: pd.Series,
    y_pred: pd.Series,
    y_pred_proba: pd.DataFrame,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
    **kwargs
) -> float:
    return fbeta_score(y_true, y_pred, beta=2, average='macro')

def estimate(
    estimated_target_probabilities: pd.DataFrame,
    y_pred: pd.Series,
    y_pred_proba: pd.DataFrame,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
    **kwargs
):
    beta = 2

    def estimate_fb(_y_pred, _y_pred_proba, beta) -> float:
        # Estimates the Fb metric.
        #
        # Parameters
        # ----------
        # y_pred: np.ndarray
        #     Predicted class label of the sample
        # y_pred_proba: np.ndarray
        #     Probability estimates of the sample for each class in the model.
        # beta: float
        #     beta parameter
        #
        # Returns
        # -------
        # metric: float
        #     Estimated Fb score.
        

        est_tp = np.sum(np.where(_y_pred == 1, _y_pred_proba, 0))
        est_fp = np.sum(np.where(_y_pred == 1, 1 - _y_pred_proba, 0))
        est_fn = np.sum(np.where(_y_pred == 0, _y_pred_proba, 0))
        est_tn = np.sum(np.where(_y_pred == 0, 1 - _y_pred_proba, 0))

        fbeta =  (1 + beta**2) * est_tp / ( (1 + beta**2) * est_tp + est_fp + beta**2 * est_fn)
        fbeta = np.nan_to_num(fbeta)
        return fbeta

    estimated_target_probabilities = estimated_target_probabilities.to_numpy()
    y_preds = label_binarize(y_pred, classes=labels)

    ovr_estimates = []
    for idx, _  in enumerate(labels):
        ovr_estimates.append(
            estimate_fb(
                y_preds[:, idx],
                estimated_target_probabilities[:, idx],
                beta=2
            )
        )
    multiclass_metric = np.mean(ovr_estimates)

    return multiclass_metric


## Create an instance of the custom metric module
custom_metric = nml_sdk.monitoring.CustomMetric()

cm = custom_metric.create(
        name="custom_F_2", 
        description="Custom implementation for F_2",
        problem_type="MULTICLASS_CLASSIFICATION",
        calculation_function=calculate,
        estimation_function=estimate,
        lower_value_limit=0.0, 
        upper_value_limit=1.0, 
    )
    
## We will add this custom metric to the existing model, model_id = 1
nml_sdk.monitoring.Model.add_custom_metric(model_id=1, metric_id=cm['id'])

# Trigger analysis of the new data
nml_sdk.monitoring.Run.trigger(model_id=1)
```

### Regression

To define a Regression custom metric, you need to set up a loss function and an aggregate function. Those functions are needed to calculate realized performance and estimate performance. Please refer to the document [Writing Functions for Regression](/cloud/v0.24.2/model-monitoring/custom-metrics/creating-custom-metrics/writing-functions-for-regression.md) where the concept of the loss and aggregate functions are better defined.

```python
import nannyml_cloud_sdk as nml_sdk

## First, authenticate to NannyML cloud
nml_sdk.url = "https://beta.app.nannyml.com"
nml_sdk.api_token = r"api token goes here"

import numpy as np
import pandas as pd

def loss(
    y_true: pd.Series,
    y_pred: pd.Series,
    chunk_data: pd.DataFrame,
    **kwargs
) -> np.ndarray:
    y_true = y_true.to_numpy()
    y_pred = y_pred.to_numpy()

    alpha = 0.9
    factor1 = alpha * np.maximum(y_true - y_pred, 0)
    factor2 = (1 - alpha) * np.maximum(y_pred - y_true, 0)
    return factor1 + factor2

def aggregate(
    loss: np.ndarray,
    chunk_data: pd.DataFrame,
    **kwargs
) -> float:
    return loss.mean()

## Create an instance of the custom metric module
custom_metric = nml_sdk.monitoring.CustomMetric()

cm = custom_metric.create(
        name="custom_alpha_loss", 
        description="Custom implementation for Direct Loss Estimation",
        problem_type="REGRESSION",
        loss_function=loss,
        aggregation_function=aggregate,
        lower_value_limit=0.0,
        upper_value_limit=None, 
    )
    
## We will add this custom metric to the existing model, model_id = 1
nml_sdk.monitoring.Model.add_custom_metric(model_id=1, metric_id=cm['id'])

# Trigger analysis of the new data
nml_sdk.monitoring.Run.trigger(model_id=1)
```