Virtualenv creation failure due to setuptools >= 71.0.0

Pin setuptools version 70.3.0.

Written by Cedric Law

Last published at: September 12th, 2024

Problem

When you try to execute a notebook with an interactive cluster or job cluster using Databricks Workflows, the cluster does not execute the notebook. In the cluster logs, you observe errors like:

 

20/01/01 00:00:00 ERROR Utils: Process List(virtualenv, /local_disk0/.ephemeral_nfs/envs/pythonEnv-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX, -p, /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/python, --no-download, --no-setuptools, --no-wheel) exited with code 1, and RuntimeError: failed to query /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/python with code 1 err: 'Traceback (most recent call last):\n  File "/usr/local/lib/python3.10/dist-packages/virtualenv/discovery/py_info.py", line 543, in <module>\n    info = PythonInfo()._to_json()\n  File "/usr/local/lib/python3.10/dist-packages/virtualenv/discovery/py_info.py", line 90, in __init__\n    self.distutils_install = {u(k): u(v) for k, v in self._distutils_install().items()}\n  File "/usr/local/lib/python3.10/dist-packages/virtualenv/discovery/py_info.py", line 165, in _distutils_install\n    i.finalize_options()\n  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/setuptools/command/install.py", line 57, in finalize_options\n    super().finalize_options()\n  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/setuptools/_distutils/command/install.py", line 407, in finalize_options\n    \'dist_fullname\': self.distribution.get_fullname(),\n  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/setuptools/_core_metadata.py", line 266, in get_fullname\n    return _distribution_fullname(self.get_name(), self.get_version())\n  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/setuptools/_core_metadata.py", line 284, in _distribution_fullname\n    canonicalize_version(version, strip_trailing_zero=False),\nTypeError: canonicalize_version() got an unexpected keyword argument \'strip_trailing_zero\'\n'
20/01/01 00:00:00 ERROR VirtualenvCloneHelper: Encountered error during virtualenv creation org.apache.spark.SparkException: Process List(virtualenv, /local_disk0/.ephemeral_nfs/envs/pythonEnv-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX, -p, /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/python, --no-download, --no-setuptools, --no-wheel) exited with code 1.

Cause

The setuptools Python package made a change in version 71.0.0, resulting in Databricks Runtime not being able to create a virtual environment. 

 

The highest version of setuptools provided by Databricks Runtime is 68.0.0 from 15.3. This means that versions higher than 68.0.0 are provided by the user’s Python job, directly or indirectly as a dependency.

Solution

As of July 17th (2024), the latest compatible version of setuptools with Databricks Runtime 9.1 LTS - 15.3 is 70.3.0. Please pin setuptools to version 70.3.0. You have three options. 

Notebook-level

At the start of your notebook, include the following line to pin the setuptools version to 70.3.0:

%pip install setuptools==70.3.0

 

For more information, please review the Notebook-scoped Python libraries (AWSAzureGCP) documentation.

Cluster library UI

From the Databricks navigation sidebar, navigate to Compute(Your cluster)Libraries. In here, you can pin the setuptools on a cluster-level by providing the following string: 

setuptools==70.3.0

 

For more information, please review the Cluster libraries (AWSAzureGCP) documentation. 

(Global) init script

Init scripts are executed during cluster start-up and can ensure that the setuptools is pinned to the appropriate version. Inside the init script, you can add the following script to pin the setuptools version:

 

#/bin/bash

/databricks/python/bin/pip install setuptools==70.3.0

 

For more information, please review the What are init scripts? (AWSAzureGCP) documentation.

Bonus: Find the source of the unversioned setuptools installation

If you would like to understand the mechanics behind dependencies installing unpinned setuptools versions, you can use pipdeptree to clarify the dependency resolution of all the dependencies on a cluster.

 

Install pipdeptree directly in a notebook:

%pip install pipdeptree

 

Display the dependency tree by executing the following command in a notebook:

%sh pipdeptree

 

For more information, please review the pipdeptree documentation.