Failing API calls in MLflow because of Float64 column values

Explicitly log the input schema while creating the model.

Written by nelavelli.durganagajahnavi

Last published at: September 23rd, 2024

Problem

When calling the Databricks Machine Learning API to score a model endpoint, you notice the API calls failing as Float64 values in a column are not recognized. 

Exception: Request failed with status 400, {"error_code": "BAD_REQUEST", "message": "Incompatible input types for column {your-column-name}. Can not safely convert float64 to float32."}. 

This issue is not specific to any particular cluster, workflow, or notebook, and persists across different clusters with ML runtime and various workflows and notebooks.

Cause

When the input data schema is not explicitly logged, the schema inferred from the JSON (created by json_dumps) does not match the expected schema. This causes the failing API call, particularly when converting Float64 to Float32.

Solution

Explicitly log the input schema when creating the model. 

  1. Manually construct the signature object for the input schema using the mlflow.models.ModelSignature and mlflow.types.schema.Schema classes.
  2. Log the model with the explicitly defined input schema.

Example

from mlflow.models import ModelSignature, infer_signature
from mlflow.types.schema import Schema, ColSpec

# Define the input schema
input_schema = Schema([
    ColSpec("double", "sepal length (cm)"),
    ColSpec("double", "sepal width (cm)"),
    ColSpec("double", "petal length (cm)"),
    ColSpec("double", "petal width (cm)")
])

# Define the output schema
output_schema = Schema([ColSpec("long")])

# Create the signature object
signature = ModelSignature(inputs=input_schema, outputs=output_schema)

# Log the model with the schema
mlflow.pyfunc.log_model(artifact_path="model", python_model=<your-model>, signature=signature)

 

For more information, please refer to the MLflow Model Signatures and Input Examples Guide