# Handling Missing Values

In previous tutorials, we saw how to create the functions needed for simple custom metrics for [binary classification](/cloud/v0.24.1/model-monitoring/custom-metrics/creating-custom-metrics/writing-functions-for-binary-classification.md), [multiclass classification](/cloud/v0.24.1/model-monitoring/custom-metrics/creating-custom-metrics/writing-functions-for-multiclass-classification.md), and [regression](/cloud/v0.24.1/model-monitoring/custom-metrics/creating-custom-metrics/writing-functions-for-regression.md). Let's see how we can improve on said code to be able to handle missing values in our data. As previously we assume the user has access to a Jupyter Notebook python environment with the [NannyML open-source](https://github.com/NannyML/nannyml) library installed.

## Handling missing values in binary classification

Let's load the covariate shift dataset we have been using and add some missing values.

```python
import numpy as np
import pandas as pd
import nannyml as nml

# Comment out if needed the code below to filter out warnings
# import warnings
# warnings.filterwarnings('ignore')

# Comment out if needed the code below to see logging messages
# import logging
# logging.basicConfig(level=logging.DEBUG)

reference = pd.read_parquet("https://github.com/NannyML/sample_datasets/raw/main/synthetic_pure_covariate_shift_datasets/binary_classification/synthetic_custom_metrics_binary_classification_reference.pq")
monitored = pd.read_parquet("https://github.com/NannyML/sample_datasets/raw/main/synthetic_pure_covariate_shift_datasets/binary_classification/synthetic_custom_metrics_binary_classification_monitored.pq")

reference.y_pred.iloc[11_000:13_000] = np.nan
reference.y_true.iloc[17_000:19_000] = np.nan
reference.y_pred.iloc[21_000:23_000] = np.nan
reference.y_true.iloc[27_000:29_000] = np.nan
reference.y_pred.iloc[31_000:33_000] = np.nan
reference.y_true.iloc[37_000:39_000] = np.nan
reference.y_pred_proba.iloc[17_000:19_000] = np.nan
reference.y_pred_proba.iloc[27_000:29_000] = np.nan
reference.y_pred_proba.iloc[37_000:39_000] = np.nan
```

As a reminder here are the custom metric functions for the `F_2` metric we [already created](/cloud/v0.24.1/model-monitoring/custom-metrics/creating-custom-metrics/writing-functions-for-binary-classification.md).

```python
import pandas as pd
from sklearn.metrics import fbeta_score

def calculate(
    y_true: pd.Series,
    y_pred: pd.Series,
    y_pred_proba: pd.DataFrame,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
    **kwargs
) -> float:
    # labels and class_probability_columns are only needed for multiclass classification
    # and can be ignored for binary classification custom metrics
    return fbeta_score(y_true, y_pred, beta=2)
```

```python
import numpy as np
import pandas as pd

def estimate(
    estimated_target_probabilities: pd.DataFrame,
    y_pred: pd.Series,
    y_pred_proba: pd.DataFrame,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
    **kwargs
) -> float:
    # labels and class_probability_columns are only needed for multiclass classification
    # and can be ignored for binary classification custom metrics

    estimated_target_probabilities = estimated_target_probabilities.to_numpy().ravel()
    y_pred = y_pred.to_numpy()

    # Create estimated confusion matrix elements
    est_tp = np.sum(np.where(y_pred == 1, estimated_target_probabilities, 0))
    est_fp = np.sum(np.where(y_pred == 1, 1 - estimated_target_probabilities, 0))
    est_fn = np.sum(np.where(y_pred == 0, estimated_target_probabilities, 0))
    est_tn = np.sum(np.where(y_pred == 0, 1 - estimated_target_probabilities, 0))

    beta = 2
    fbeta =  (1 + beta**2) * est_tp / ( (1 + beta**2) * est_tp + est_fp + beta**2 * est_fn)
    fbeta = np.nan_to_num(fbeta)
    return fbeta
```

There is an open question of how to deal with the missing values. This is ultimately up to the user and the particular use case for which the custom metric is being created. Here we will show how to remove rows containing missing values for the custom metric calculation. Doing this the custom metric functions become:

```python
import pandas as pd
from sklearn.metrics import fbeta_score

def calculate(
    y_true: pd.Series,
    y_pred: pd.Series,
    y_pred_proba: pd.DataFrame,
    chunk_data: pd.DataFrame,
    **kwargs
) -> float:
    data = pd.DataFrame({
        'y_true': y_true,
        'y_pred': y_pred
    })
    data.dropna(axis=0, inplace=True)
    return fbeta_score(data.y_true, data.y_pred, beta=2)
```

```python
import numpy as np
import pandas as pd

def estimate(
    estimated_target_probabilities: pd.DataFrame,
    y_pred: pd.Series,
    y_pred_proba: pd.DataFrame,
    chunk_data: pd.DataFrame,
    labels: list[str],
    class_probability_columns: list[str],
) -> float:
    # labels and class_probability_columns are only needed for multiclass classification
    # and can be ignored for binary classification custom metrics

    data = pd.DataFrame({
        'estimated_target_probabilities': estimated_target_probabilities.to_numpy().ravel(),
        'y_pred_proba': y_pred_proba.to_numpy().ravel(),
        'y_pred': y_pred,
    })
    data.dropna(axis=0, inplace=True)
    y_pred = data.y_pred.to_numpy()
    estimated_target_probabilities = data.estimated_target_probabilities.to_numpy()

    est_tp = np.sum(np.where(y_pred == 1, estimated_target_probabilities, 0))
    est_fp = np.sum(np.where(y_pred == 1, 1 - estimated_target_probabilities, 0))
    est_fn = np.sum(np.where(y_pred == 0, estimated_target_probabilities, 0))
    est_tn = np.sum(np.where(y_pred == 0, 1 - estimated_target_probabilities, 0))

    beta = 2
    fbeta =  (1 + beta**2) * est_tp / ( (1 + beta**2) * est_tp + est_fp + beta**2 * est_fn)
    fbeta = np.nan_to_num(fbeta)
    return fbeta
```

By looking at the `estimate` function it is visible that even there, decisions may need to be made. For example which columns to include in the functions that drops rows if they contain missing values. Again this can depend on the use case and what data the function is expected to handle.&#x20;

We can now test our functions to see if they are robust when they encounter missing values:

```python
class_probability_columns = ['y_pred_proba',]
labels = [0, 1]

calculate(
    reference['y_true'],
    reference['y_pred'],
    reference[class_probability_columns],
    reference,
    labels=labels,
    class_probability_columns=class_probability_columns
)
```

```python
0.8081988545474137
```

```python
estimate(
    reference[['estimated_target_probabilities']],
    reference['y_pred'],
    reference[class_probability_columns],
    reference,
    labels=labels,
    class_probability_columns=class_probability_columns
)
```

```python
0.8081160256970812
```

## Next Steps

We can now test our new functions by creating a new custom metric either through the [GUI of the web interface](/cloud/v0.24.1/model-monitoring/custom-metrics.md) or by using the [NannyML Cloud SDK](/cloud/v0.24.1/model-monitoring/custom-metrics/adding-a-custom-metric-through-nannyml-sdk.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.nannyml.com/cloud/v0.24.1/model-monitoring/custom-metrics/creating-custom-metrics/handling-missing-values.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.