Apache Spark jobs fail with Environment directory not found error

Spark jobs appear to time out after you install a library because security rules are preventing workers from resolving the Python executable path.

Written by Adam Pavlacka

Last published at: July 1st, 2022

Problem

After you install a Python library (via the cluster UI or by using pip), your Apache Spark jobs fail with an Environment directory not found error message.

org.apache.spark.SparkException: Environment directory not found at
/local_disk0/.ephemeral_nfs/cluster_libraries/python

Cause

Libraries are installed on a Network File System (NFS) on the cluster's driver node. If any security group rules prevent the workers from communicating with the NFS server, Spark commands cannot resolve the Python executable path.

Solution

You should make sure that your security groups are configured with appropriate security rules (AWS | Azure | GCP).