Library unavailability causing job failures

Learn how to resolve Databricks job failures caused by unavailable libraries.

Written by Adam Pavlacka

Last published at: May 11th, 2022

Problem

You are launching jobs that import external libraries and get an Import Error.

When a job causes a node to restart, the job fails with the following error message:

ImportError: No module named XXX

Cause

The Cluster Manager is part of the Databricks service that manages customer Apache Spark clusters. It sends commands to install Python and R libraries when it restarts each node. Sometimes, library installation or downloading of artifacts from the internet can take more time than expected. This occurs due to network latency, or it occurs if the library that is being attached to the cluster has many dependent libraries.

The library installation mechanism guarantees that when a notebook attaches to a cluster, it can import installed libraries. When library installation through PyPI takes excessive time, the notebook attaches to the cluster before the library installation completes. In this case, the notebook is unable to import the library.

Solution

Method 1

Use notebook-scoped library installation commands in the notebook. You can enter the following commands in one cell, which ensures that all of the specified libraries are installed.

%sh

dbutils.library.installPyPI("mlflow")
dbutils.library.restartPython()

Method 2

AWS

To avoid delay in downloading the libraries from the internet repositories, you can cache the libraries in DBFS or S3.

For example, you can download the wheel or egg file for a Python library to a DBFS or S3 location. You can use the REST API or cluster-scoped init scripts to install libraries from DBFS or S3.

First, download the wheel or egg file from the internet to the DBFS or S3 location. This can be performed in a notebook as follows:

Delete

Azure

To avoid delay in downloading the libraries from the internet repositories, you can cache the libraries in DBFS or Azure Blob Storage.

For example, you can download the wheel or egg file for a Python library to a DBFS or Azure Blob Storage location. You can use the REST API or cluster-scoped init scripts to install libraries from DBFS or Azure Blob Storage.

First, download the wheel or egg file from the internet to the DBFS or Azure Blob Storage location. This can be performed in a notebook as follows:

Delete
%sh

cd /dbfs/mnt/library
wget <whl/egg-file-location-from-pypi-repository>

After the wheel or egg file download completes, you can install the library to the cluster using the REST API, UI, or init script commands.

Was this article helpful?