Logging a model with MLflow in a PySpark pipeline throws a TempDir class assertion error

Upgrade your MLflow version to 2.16.0 or higher.

Written by Shyamprasad Miryala

Last published at: November 15th, 2024

Problem

When attempting to log a model with MLflow in a PySpark pipeline, you encounter an assertion error related to the TempDir class in MLflow. 

 

An error occurred during model logging:
 %s Traceback (most recent call last):
  File "/databricks/python/lib/python3.10/site-packages/retail_sales_data_product/training/transform.py", line 266, in log_model
    mlflow.xgboost.log_model(
  File "/databricks/python/lib/python3.10/site-packages/mlflow/xgboost/__init__.py", line 270, in log_model
    return Model.log(
  File "/databricks/python/lib/python3.10/site-packages/mlflow/models/model.py", line 620, in log
    with TempDir() as tmp:
  File "/databricks/python/lib/python3.10/site-packages/mlflow/utils/file_utils.py", line 426, in __exit__
    assert os.path.exists(os.getcwd())
AssertionError
Ending MLflow run

 

Cause

MLflow is attempting to verify the current working directory’s existence but the working directory has become invalid. 

 

Solution

Upgrade your MLflow version to 2.16.0 or higher. 

Alternatively, you can upgrade your Databricks runtime to version 13.3 LTS or above, which comes with the latest version of MLflow.

For more detail on pre-installed library versions in Databricks Runtime, please refer to the Databricks Runtime release notes versions and compatibility (AWSAzureGCP) documentation.