ModuleNotFoundError: No module named 'packaging' when creating GPU Model Serving endpoint

Add pip_requirements during model logging.

Written by jessica.santos

Last published at: October 16th, 2024

Problem

Your container build process fails during model serving endpoint creation after logging a machine learning model using MLflow. You have run the logging and registering code on a Databricks GPU cluster. 

The following error appears in the Model Serving endpoint's Build Logs, stating that the ‘packaging’ module is missing.

Installing pip dependencies: ...working... Pip subprocess error:
#20 82.39 error: subprocess-exited-with-error
#20 82.39 × python setup.py egg_info did not run successfully.
#20 82.39 │ exit code: 1
#20 82.39 ╰─> [6 lines of output]
#20 82.39 Traceback (most recent call last):
#20 82.39 File "<string>", line 2, in <module>
#20 82.39 File "<pip-setuptools-caller>", line 34, in <module>
#20 82.39 File "/tmp/pip-install-0d8nbi6j/flash-attn_729e3168ad4e43fcbebe75f2aa40d649/setup.py", line 9, in <module>
#20 82.39 from packaging.version import parse, Version
#20 82.39 ModuleNotFoundError: No module named 'packaging'
#20 82.39 [end of output]

Cause

When logging your model using mlflow.<mlflow-flavor>.log_model without declaring the pip_requirements parameter (which contains the model's dependencies and their versions in a list of strings format), MLflow defaults to inferring the model's dependencies based on the current notebook session and persists them in the requirements.txt file. 

In this file, each dependency is listed in alphabetical order. However, this causes an issue because flash-attn comes before the packaging module, but flash-attn requires the packaging module to be installed first. 

Since the model's dependencies are installed during the container build process in the alphabetical order specified in requirements.txt, the packaging module is considered missing during the flash-attn installation.

Solution

Re-log your model in the same notebook/source file and explicitly include the pip_requirements parameter with the packaging module ordered first.  

  1. Get the notebook/source file used to log your model ready. 
  2. Navigate to the Experiment page of your model and then to the run that produced the model you want to serve. 
  3. Within the Run page, click on the Artifacts tab and then on the requirements.txt file. 
  4. Select all the content of this file and copy it. 
  5. Create a list of strings from the copied content of the requirements.txt file, where each string represents one dependency package required by your model. 
  6. Reorder the ‘packaging’ dependency to position it before the flash-attn package. You can also simply delete flash-attn if you do not need it. 
  7. In the notebook/source file used to log your model, add the pip_requirements parameter to the mlflow.<mlflow-flavor>.log_model function, setting it to the list of strings you manipulated in the previous step. 
  8. Rerun your source code to re-log and register a new version of your model.
  9. After the new model version is created, proceed to serve your model using Model Serving.

If you use transformer-based models, Databricks also recommends following an optimized-mpt-serving approach, which is to include metadata when logging the model:

metadata = {"task": "llm/v1/completions"} 

For more information, please review the Optimized large language model (LLM) serving (AWSAzure) documentation.

Redeploy the model using metadata information as a new version and serve the model using the API. 

If the above steps do not resolve the issue, please specify CUDA_HOME in the Dockerfile in order to support flash-attn.