Apache Spark jobs fail or stall when importing .whl from /Workspace on multi-node clusters

Allow all traffic on ports 1017,1015 and 1021.

Written by anudeep.konaboina

Last published at: August 22nd, 2025

Problem

You are trying to import a module from a wheel file (.whl) installed as a library. When you conduct the import from a workspace (/Workspace) on multi-node clusters, the Apache Spark job stalls or fails. You receive the following error.

2025/08/06 09:35:18.473363 [#6664677558339676] workspace.ERROR wsfs/workspace/workspace.go:329 logUnexpectedError:[Driver] CheckRetry(err != nil) statusCode:-1 err:Get "https://10.53.193.60:1017/api/2.0/workspace-files/get-safe-flags": dial tcp 10.53.193.60:1017: i/o timeout

 

Cause

On executors, /Workspace is not locally mounted. Workspace Filesystem (WSFS) resolves the import through remote API calls to the driver over ports 1017, 1015, and 1021.

 

There is network latency or firewall blocking on ports 1017, 1015, and 1021, causing the import process to retry multiple times. The retries can result in the job stalling or failing.

 

On drivers, /Workspace is locally mounted. When you run a job on a single-node cluster, imports resolve quickly.

 

Solution

Allow all traffic on ports 1017,1015, and 1021.

 

Check your workspace network, firewall, or security group configuration. Ensure ports 1017, 1015, and 1021 are open between executors and the driver and remove any firewall rules blocking traffic on these ports.