Problem
When migrating to Databricks Runtime 13.3 LTS to current (15.3), libraries start failing with owner or network related errors.
Example
Library installation attempted on the driver node of cluster XXXX-XXXXXX-XXXXXXXX and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: org.apache.spark.SparkException: Process List(/bin/su, libraries, -c, bash /local_disk0/.ephemeral_nfs/cluster_libraries/python/python_start_clusterwide.sh /local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/pip install 'petastorm==0.12.0' --disable-pip-version-check) exited with code 1. WARNING: The directory '/home/libraries/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Cause
The Python library installation index defaults to https://pypi.org/simple.
Due to security enhancements, on Databricks Runtime 13.3 LTS to current (15.3), libraries are installed as a non-root user. For more information, please review the Cluster-scoped Python libraries are installed using a non-root user (AWS | Azure | GCP) documentation.
If you have set a global and/or cluster-scoped init script to exchange the default index for a custom repository (for example, pointing to an artifact one), this index will not be applicable to the new user if you do not set it as a global index.
Solution
Installed via init script
Adjust your init script index so it uses the --global
flag and points to your custom index URL.
/databricks/python/bin/pip config --global set global.index-url <your-custom-index-url>
Installed via workspace UI
- Open the cluster properties and click on Libraries.
- Select a library, or click Install new to install a new library.
- Set the custom index URL in the Index URL field.
Note
If installing via the workspace UI, you must individually set the custom index URL for every library that requires one.