Problem
When you are attempting to register a machine learning model from Hugging Face that was trained outside of Databricks, the model fails with the error message Failed to infer Schema:
`MLFlowException: Failed to infer Schema. Expected one of the following types:
-pandas.DataFrame
-pandas.Series…
File /databricks/python/lib/python3.11/site-packages/mlflow/types/utils.py:374 in infer_schema(data)...`
Cause
Databricks expects model artifacts to follow a specific structure. When calling mlflow.<flavor>.log_model, MLflow arranges the model's artifacts properly for correct loading. If you attempt to register a model trained outside of Databricks or try to fine-tune it with additional data in a Databricks notebook, this may lead to a Failed to infer Schema error due to the artifact structure not aligning with Databricks' expectations for Hugging Face models. 
Solution
This issue arises in Databricks environments when working with machine learning models, particularly those trained outside of Databricks. To address the fact that the model artifacts are in the structure expected by Databricks, the entire code from the external environment should be used to retrain the model within Databricks before fine-tuning.
To resolve this issue, follow these steps:
- Configure the MLflow tracking server. Set up the MLflow tracking server to register the model in the code used to train it outside of Databricks.
 - Modify and reorder the artifact's folder in such a way that it matches the structure of a Hugging Face model.
 - Use MLflow logging. Use 
mlflow.<flavor>.log_modelto log the model, which automatically handles the artifact structure. 
The typical structure of a Hugging Face model includes:
- 
config.json: Contains the model configuration - 
pytorch_model.bin: The model weights - 
tokenizer.jsonor other tokenizer files: For text processing - 
README.md: A model card describing the model's purpose and usage 
For more details on model structure and creating custom models compatible with the Hugging Face ecosystem, review the Create a custom architecture documentation.
Best practices
Databricks recommends the following best practices while creating the models:
- Ensure proper artifact structure. Before registering the model within a Databricks notebook, verify that the artifact structure aligns with Databricks' expectations.
 - Understand model signatures. Familiarise yourself with 
infer_signatureorModelSignaturemethods to properly define input and output schemas for your models - Ensure you are familiar with the MLflow Python API documentation.
 
For more information, review the Track model development using MLflow (AWS | Azure | GCP) documentation.