Problem
When calling the Databricks Machine Learning API to score a model endpoint, you notice the API calls failing as Float64 values in a column are not recognized.
Exception: Request failed with status 400, {"error_code": "BAD_REQUEST", "message": "Incompatible input types for column {your-column-name}. Can not safely convert float64 to float32."}.
This issue is not specific to any particular cluster, workflow, or notebook, and persists across different clusters with ML runtime and various workflows and notebooks.
Cause
When the input data schema is not explicitly logged, the schema inferred from the JSON (created by json_dumps
) does not match the expected schema. This causes the failing API call, particularly when converting Float64 to Float32.
Solution
Explicitly log the input schema when creating the model.
-
Manually construct the signature object for the input schema using the
mlflow.models.ModelSignature
andmlflow.types.schema.Schema
classes. - Log the model with the explicitly defined input schema.
Example
from mlflow.models import ModelSignature, infer_signature
from mlflow.types.schema import Schema, ColSpec
# Define the input schema
input_schema = Schema([
ColSpec("double", "sepal length (cm)"),
ColSpec("double", "sepal width (cm)"),
ColSpec("double", "petal length (cm)"),
ColSpec("double", "petal width (cm)")
])
# Define the output schema
output_schema = Schema([ColSpec("long")])
# Create the signature object
signature = ModelSignature(inputs=input_schema, outputs=output_schema)
# Log the model with the schema
mlflow.pyfunc.log_model(artifact_path="model", python_model=<your-model>, signature=signature)
For more information, please refer to the MLflow Model Signatures and Input Examples Guide.