Machine learning (AWS)

Conda fails to download packages from Anaconda

Conda fails to download packages with PackagesNotFoundError when you try to install packages from Anaconda....

Last updated: May 16th, 2022 by mathan.pillai

Download artifacts from MLflow

How to download artifacts from MLflow to local storage....

Last updated: May 16th, 2022 by shanmugavel.chandrakasu

How to extract feature information for tree-based Apache SparkML pipeline models

Learn how to extract feature information for tree-based ML pipeline models in Databricks....

Last updated: May 16th, 2022 by Adam Pavlacka

Fitting an Apache SparkML model throws error

Learn how to resolve errors thrown by Databricks when fitting a SparkML model or pipeline....

Last updated: May 16th, 2022 by Adam Pavlacka

H2O.ai Sparkling Water cluster not reachable

H2O.ai Sparkling Water cluster not reachable if the version of the Sparkling Water package does not match the version of Spark used on your cluster....

Last updated: May 16th, 2022 by shanmugavel.chandrakasu

How to perform group K-fold cross validation with Apache Spark

Learn how to perform group K-fold cross validation with Apache Spark on Databricks....

Last updated: February 24th, 2023 by Adam Pavlacka

Error when importing OneHotEncoderEstimator

You get an error message when trying to import OneHotEncoderEstimator....

Last updated: May 16th, 2022 by Shyamprasad Miryala

MLflow project fails to access an Apache Hive table

Resolve "Table or view not found" error when an MLflow project fails to access an Apache Hive table....

Last updated: May 16th, 2022 by vikas.yadav

How to speed up cross-validation

Learn how to improve cross-validation performance in SparkML with Databricks....

Last updated: May 16th, 2022 by Adam Pavlacka

Hyperopt fails with maxNumConcurrentTasks error

Do NOT install Hyperopt on a Databricks Runtime for Machine Learning cluster....

Last updated: May 16th, 2022 by chetan.kardekar

Incorrect results when using documents as inputs

Your model does not return expected results when documents are input using TfidfVectorizer. JSON array...

Last updated: May 16th, 2022 by pradeepkumar.palaniswamy

Errors when accessing MLflow artifacts without using the MLflow client

Resolve errors when attempting to access MLflow artifacts without using the MLflow client...

Last updated: May 16th, 2022 by Adam Pavlacka

Experiment warning when custom artifact storage location is used

Resolve experiment warnings when a custom artifact storage location is used instead of the MLflow managed location....

Last updated: May 16th, 2022 by Adam Pavlacka

Experiment warning when legacy artifact storage location is used

Resolve experiment warnings when a legacy artifact storage location is used instead of the MLflow managed location....

Last updated: May 16th, 2022 by Adam Pavlacka

KNN model using pyfunc returns ModuleNotFoundError or FileNotFoundError

Predictions using pyfunc on a KNN model returns a ModuleNotFoundError or FileNotFoundError....

Last updated: May 16th, 2022 by pradeepkumar.palaniswamy

OSError when accessing MLflow experiment artifacts

Resolve an `OSError` when trying to access, download, or log MLflow experiment artifacts....

Last updated: May 16th, 2022 by Adam Pavlacka

PERMISSION_DENIED error when accessing MLflow experiment artifact

Resolve a PERMISSION_DENIED error when trying to access MLflow experiment artifacts....

Last updated: May 16th, 2022 by Adam Pavlacka

Python commands fail on Machine Learning clusters

Python commands are failing on Databricks Runtime for Machine Learning clusters. Conda....

Last updated: May 16th, 2022 by arjun.kaimaparambilrajan

Runs are not nested when SparkTrials is enabled in Hyperopt

When SparkTrials is enabled in Hyperopt, MLflow runs are not nested under the parent run....

Last updated: May 16th, 2022 by pradeepkumar.palaniswamy

MLflow 'invalid access token' error

Long running ML tasks require an access token with an extended lifetime to ensure the tasks complete before the token expires....

Last updated: July 22nd, 2022 by shanmugavel.chandrakasu

Unable to update a serving endpoint even with necessary permissions

Find the creating user in the workspace and re-add the permissions....

Last updated: September 12th, 2024 by amrith.v

Vector search index contains incorrect number of rows

Ensure that the Delta table has a unique primary key....

Last updated: September 12th, 2024 by brock.baurer

ModuleNotFoundError: No module named 'packaging' when creating GPU Model Serving endpoint

Add pip_requirements during model logging. ...

Last updated: October 16th, 2024 by jessica.santos

Status FAILURE when attempting to update a Direct Vector Access Index

Ensure that the embedding_dimension parameter matches the length of the embedding field. ...

Last updated: September 23rd, 2024 by jessica.santos

Table not available while creating AutoML experiment model

Migrate to Unified Compute (UC), with the option to use AutoML with the Python API in the meantime....

Last updated: September 12th, 2024 by kaushal.vachhani

Failing API calls in MLflow because of Float64 column values

Explicitly log the input schema while creating the model....

Last updated: September 23rd, 2024 by nelavelli.durganagajahnavi

Disable or restrict access to the foundation model APIs

Set rate limits to zero....

Last updated: August 30th, 2024 by amrith.v

Trying to load an MLflow model using a Python script returns Py4JJavaError

Upgrade MLflow to version 2.15.0 or above. ...

Last updated: September 30th, 2024 by anshuman.sahu

TypeError with an unexpected keyword argument 'query_type' when attempting to perform hybrid similarity search using the databricks-vectorsearch package

Ensure you have the latest version of the databricks-vectorsearch package installed in your environment....

Last updated: October 17th, 2024 by jessica.santos

TABLE_ONLINE_VECTOR_INDEX_REPLICA does not support Lakehouse Federation

Use the Python SDK or the REST API for querying a vector search endpoint. ...

Last updated: October 18th, 2024 by nelavelli.durganagajahnavi

Error message when trying to send a JSON input data request to a model endpoint

Pass one more input parameter from the list of input columns in the endpoint’s model signature....

Last updated: October 23rd, 2024 by Shyamprasad Miryala

CUDA out of memory error message in GPU clusters

Change the GPU device used by your driver and/or worker nodes....

Last updated: October 24th, 2024 by jessica.santos

Slow model fitting when implementing Alternating Least Squares using Apache Spark PySpark

Override the block default to match the total cores available and consider using compute-optimized instances. ...

Last updated: November 12th, 2024 by Amruth Ashoka

Vector search index queries with the `%` character not performing partial-string matches

When using the `LIKE` operator in vector search filters, specify the exact string you want to match...

Last updated: November 12th, 2024 by Amruth Ashoka

Column name error when using Apache Spark Mlib feature transformers

When flattening the DataFrame, rename nested columns using an underscore instead of a dot....

Last updated: November 15th, 2024 by Shyamprasad Miryala

Logging a model with MLflow in a PySpark pipeline throws a TempDir class assertion error

Upgrade your MLflow version to 2.16.0 or higher....

Last updated: November 15th, 2024 by Shyamprasad Miryala

MLflow API 429 errors when transitioning models

Add retry logic with exponential backoff to avoid hitting the rate limit....

Last updated: December 2nd, 2024 by julian.campabadal

Model serving endpoint creation succeeds but deployment fails and error stack trace has message _ARRAY_API not found

Include NumPy as a pip dependency and specify the version range to be installed. ...

Last updated: December 11th, 2024 by jessica.santos

Slowness when using the foundational model API with pay-per-token mode

Switch to provisioned throughput mode for high throughput and performance guarantee requirements....

Last updated: January 7th, 2025 by kaushal.vachhani

Getting ValueError: ndarray is not supported by dataframe_to_mds when converting an Apache Spark DataFrame to MDS format using Mosaic Streaming

Properly pass the data type of the elements of the array column in the mds_kwargs....

Last updated: January 16th, 2025 by jessica.santos

Spark ML to ONNX Model Conversion does not produce the same model - predictions differ

Define the TARGET_OPSET and then pass it as the target_opset parameter of the convert_sparkml function. ...

Last updated: January 16th, 2025 by jessica.santos

MLflow exception error when trying to migrate models from Workspace Model Registry

Set the registry URI to the Workspace Model Registry before running the MLflow operations. ...

Last updated: January 16th, 2025 by jairo.prado

Loading models using MLflow causes TypeError around unexpected number of arguments

Run your cluster using a Databricks Runtime version that has the same Python version as the one used to log and register your model....

Last updated: January 22nd, 2025 by jessica.santos

ExecutionException error when trying to use Conda as an environment manager in MLflow

Update the MLflow environment manager from Conda to virtualenv. ...

Last updated: January 29th, 2025 by Amruth Ashoka

MLflow error "INVALID_PARAMETER_VALUE" during model training and logging process

Ensure each MLflow run maintains a unique set of parameters or use nested runs to log each parameter distinctly within a session. ...

Last updated: January 29th, 2025 by Amruth Ashoka

SparkException error when trying to use an Apache Spark UDF to create and dynamically pass a prompt to the ai_query() function

Use Unity Catalog (UC) UDFs instead of Spark UDFs. ...

Last updated: January 30th, 2025 by vinay.mr

MLflow models saving to local path when registered instead of to the desired registry

Databricks recommends managing your model lifecycle in Unity Catalog....

Last updated: February 7th, 2025 by manjunath.hebbar

“Connection pool is full” error when pulling models from S3 with MLflow

Increase the maximum size of the MLflow connection pool....

Last updated: February 12th, 2025 by jairo.prado

Using MLflow API call to load a model taking the same amount of time every call and artifacts downloading from scratch

Save the model artifacts locally and then load the model from a local path....

Last updated: February 19th, 2025 by anshuman.sahu

java.lang.OutOfMemoryError error when using collect() from sparklyr

Use arrow_collect() in a custom function to avoid Spark’s 2GB limit when collecting large datasets in R....

Last updated: March 3rd, 2025 by Shyamprasad Miryala

Parameter workload_size always executing SMALL when using the databricks-agents library to update existing model serving endpoints

Update the databricks-agents library to version 0.17.0 or later....

Last updated: March 25th, 2025 by kaushal.vachhani

Model serving endpoint returns INVALID_PARAMETER_VALUE error for Anthropic multimodal requests

Use “image_url” instead of “image” content type for standardized API calls....

Last updated: March 27th, 2025 by Tarun Sanjeev

'INVALID_PARAMETER_VALUE' error when creating a Google Vertex AI serving endpoint

Ensure the entire private key is used....

Last updated: April 7th, 2025 by vidya.sagamreddy

Model serving endpoint creation fails with BadRequest error

Shorten the catalog name or specify the endpoint_name explicitly when using agents.deploy....

Last updated: April 7th, 2025 by apsarpasha.a

Not enough disk space error when downloading a model from Hugging Face

Change the default download directory from the root partition to a location with available space....

Last updated: April 18th, 2025 by jairo.prado

SHAP figure not appearing in the artifacts after running the mflow.evaluate() call, despite setting log_model_explainability = True

Ensure that all features are in the expected format and re-run the call. ...

Last updated: April 24th, 2025 by Guilherme Leite

Error when trying to use VectorAssembler while on a standard access mode cluster with a non-ML Databricks Runtime

Use a Dedicated (formerly single user) access mode cluster, assigned to a group of users....

Last updated: April 24th, 2025 by jessica.santos

Getting NoModuleFound or attribute error when using the Flash Attention model in MLflow

Use mlflow.transformers.log_model with a custom wheel version of flash-attn....

Last updated: April 25th, 2025 by G Yashwanth Kiran

Lineage for the output table with predictions is not tracked in MLflow when training a model

Save the output table as a CSV file and log it as an artifact. ...

Last updated: April 25th, 2025 by Amruth Ashoka

GPU metrics indicate that the GPU is not being used during model inference

Ensure that you are sending the model to the GPU in your code....

Last updated: April 26th, 2025 by jessica.santos

AttributeError: 'ExportMetricsResponse' when retrieving serving endpoint metrics

Use a custom Python script to download and save the serving endpoint metrics to a file in the Prometheus format....

Last updated: April 26th, 2025 by vidya.sagamreddy

MlflowClient().search_runs returns only a subset of runs

Use mlflow.search_runs() instead of MlflowClient.search_runs()....

Last updated: April 26th, 2025 by G Yashwanth Kiran

Function ai_similarity failing with “Unexpected server response" error

Remove NULL values from the data before passing it to the function. ...

Last updated: April 29th, 2025 by anshuman.sahu

Tackling schema issues that arise for ML models trained outside of Databricks

Use the code from the external environment to retrain the model within Databricks before fine-tuning....

Last updated: April 29th, 2025 by Tarun Sanjeev

Serving an AutoML model failing when deployed to an endpoint with "Failed to deploy modelName: served entity creation aborted" error

Ensure the model environment includes an explicit version pin for NumPy....

Last updated: April 30th, 2025 by Amruth Ashoka

Databricks Help Center

Conda fails to download packages from Anaconda

Download artifacts from MLflow

How to extract feature information for tree-based Apache SparkML pipeline models

Fitting an Apache SparkML model throws error

H2O.ai Sparkling Water cluster not reachable

How to perform group K-fold cross validation with Apache Spark

Error when importing OneHotEncoderEstimator

MLflow project fails to access an Apache Hive table

How to speed up cross-validation

Hyperopt fails with maxNumConcurrentTasks error

Incorrect results when using documents as inputs

Errors when accessing MLflow artifacts without using the MLflow client

Experiment warning when custom artifact storage location is used

Experiment warning when legacy artifact storage location is used

KNN model using pyfunc returns ModuleNotFoundError or FileNotFoundError

OSError when accessing MLflow experiment artifacts

PERMISSION_DENIED error when accessing MLflow experiment artifact

Python commands fail on Machine Learning clusters

Runs are not nested when SparkTrials is enabled in Hyperopt

MLflow 'invalid access token' error

Unable to update a serving endpoint even with necessary permissions

Vector search index contains incorrect number of rows

ModuleNotFoundError: No module named 'packaging' when creating GPU Model Serving endpoint

Status FAILURE when attempting to update a Direct Vector Access Index

Table not available while creating AutoML experiment model

Failing API calls in MLflow because of Float64 column values

Disable or restrict access to the foundation model APIs

Trying to load an MLflow model using a Python script returns Py4JJavaError

TypeError with an unexpected keyword argument 'query_type' when attempting to perform hybrid similarity search using the databricks-vectorsearch package

TABLE_ONLINE_VECTOR_INDEX_REPLICA does not support Lakehouse Federation

Error message when trying to send a JSON input data request to a model endpoint

CUDA out of memory error message in GPU clusters

Slow model fitting when implementing Alternating Least Squares using Apache Spark PySpark

Vector search index queries with the `%` character not performing partial-string matches

Column name error when using Apache Spark Mlib feature transformers

Logging a model with MLflow in a PySpark pipeline throws a TempDir class assertion error

MLflow API 429 errors when transitioning models

Model serving endpoint creation succeeds but deployment fails and error stack trace has message _ARRAY_API not found

Slowness when using the foundational model API with pay-per-token mode

Getting ValueError: ndarray is not supported by dataframe_to_mds when converting an Apache Spark DataFrame to MDS format using Mosaic Streaming

Spark ML to ONNX Model Conversion does not produce the same model - predictions differ

MLflow exception error when trying to migrate models from Workspace Model Registry

Loading models using MLflow causes TypeError around unexpected number of arguments

ExecutionException error when trying to use Conda as an environment manager in MLflow

MLflow error "INVALID_PARAMETER_VALUE" during model training and logging process

SparkException error when trying to use an Apache Spark UDF to create and dynamically pass a prompt to the ai_query() function

MLflow models saving to local path when registered instead of to the desired registry

“Connection pool is full” error when pulling models from S3 with MLflow

Using MLflow API call to load a model taking the same amount of time every call and artifacts downloading from scratch

java.lang.OutOfMemoryError error when using collect() from sparklyr

Parameter workload_size always executing SMALL when using the databricks-agents library to update existing model serving endpoints

Model serving endpoint returns INVALID_PARAMETER_VALUE error for Anthropic multimodal requests

'INVALID_PARAMETER_VALUE' error when creating a Google Vertex AI serving endpoint

Model serving endpoint creation fails with BadRequest error

Not enough disk space error when downloading a model from Hugging Face

SHAP figure not appearing in the artifacts after running the mflow.evaluate() call, despite setting log_model_explainability = True

Error when trying to use VectorAssembler while on a standard access mode cluster with a non-ML Databricks Runtime

Getting NoModuleFound or attribute error when using the Flash Attention model in MLflow

Lineage for the output table with predictions is not tracked in MLflow when training a model

GPU metrics indicate that the GPU is not being used during model inference

AttributeError: 'ExportMetricsResponse' when retrieving serving endpoint metrics

MlflowClient().search_runs returns only a subset of runs

Function ai_similarity failing with “Unexpected server response" error

Tackling schema issues that arise for ML models trained outside of Databricks

Serving an AutoML model failing when deployed to an endpoint with "Failed to deploy modelName: served entity creation aborted" error

SSL error when invoking Databricks model serving endpoint

PERMISSION_DENIED error while running AutoML experiment with group-assigned cluster

CUDA OutOfMemoryError tried to allocate MiB while performing model training on the GPU compute

Creation failure error when trying to create a vector search index

Receiving a CuDNN version mismatch error when running TensorFlow within a 16.3 ML runtime environment

Google AI Studio key fails with Mosaic AI Model Serving through Vertex AI provider

Streamlit app deployed as Databricks App failing with JAVA_GATEWAY_EXITED error

Tag update failure on serving endpoint

Time zone conversion is not visibly applied when using display() on timezone-aware pandas datetime columns

Contact Us