Problem
Your container build process fails during model serving endpoint creation after logging a machine learning model using MLflow. You have run the logging and registering code on a Databricks GPU cluster.
The following error appears in the Model Serving endpoint's Build Logs, stating that the ‘packaging’ module is missing.
Installing pip dependencies: ...working... Pip subprocess error:
#20 82.39 error: subprocess-exited-with-error
#20 82.39 × python setup.py egg_info did not run successfully.
#20 82.39 │ exit code: 1
#20 82.39 ╰─> [6 lines of output]
#20 82.39 Traceback (most recent call last):
#20 82.39 File "<string>", line 2, in <module>
#20 82.39 File "<pip-setuptools-caller>", line 34, in <module>
#20 82.39 File "/tmp/pip-install-0d8nbi6j/flash-attn_729e3168ad4e43fcbebe75f2aa40d649/setup.py", line 9, in <module>
#20 82.39 from packaging.version import parse, Version
#20 82.39 ModuleNotFoundError: No module named 'packaging'
#20 82.39 [end of output]
Cause
When logging your model using mlflow.<mlflow-flavor>.log_model
without declaring the pip_requirements
parameter (which contains the model's dependencies and their versions in a list of strings format), MLflow defaults to inferring the model's dependencies based on the current notebook session and persists them in the requirements.txt
file.
In this file, each dependency is listed in alphabetical order. However, this causes an issue because flash-attn
comes before the packaging module, but flash-attn
requires the packaging module to be installed first.
Since the model's dependencies are installed during the container build process in the alphabetical order specified in requirements.txt
, the packaging module is considered missing during the flash-attn
installation.
Solution
Re-log your model in the same notebook/source file and explicitly include the pip_requirements
parameter with the packaging module ordered first.
- Get the notebook/source file used to log your model ready.
- Navigate to the Experiment page of your model and then to the run that produced the model you want to serve.
- Within the Run page, click on the Artifacts tab and then on the
requirements.txt
file. - Select all the content of this file and copy it.
- Create a list of strings from the copied content of the
requirements.txt
file, where each string represents one dependency package required by your model. - Reorder the ‘packaging’ dependency to position it before the
flash-attn
package. You can also simply deleteflash-attn
if you do not need it. - In the notebook/source file used to log your model, add the
pip_requirements
parameter to themlflow.<mlflow-flavor>.log_model
function, setting it to the list of strings you manipulated in the previous step. - Rerun your source code to re-log and register a new version of your model.
- After the new model version is created, proceed to serve your model using Model Serving.
If you use transformer-based models, Databricks also recommends following an optimized-mpt-serving approach, which is to include metadata when logging the model:
metadata = {"task": "llm/v1/completions"}
For more information, please review the Optimized large language model (LLM) serving (AWS | Azure) documentation.
Redeploy the model using metadata information as a new version and serve the model using the API.
If the above steps do not resolve the issue, please specify CUDA_HOME
in the Dockerfile in order to support flash-attn
.