Python commands fail on high concurrency clusters

Python commands fail on high concurrency clusters with Apache Spark process isolation and shared session enabled. WARN error message.

Written by xin.wang

Last published at: May 19th, 2022

Problem

You are attempting to run Python commands on a high concurrency cluster.

All Python commands fail with a WARN error message.

WARN PythonDriverWrapper: Failed to start repl ReplId-61bef-9fc33-1f8f6-2
ExitCodeException exitCode=1: chown: invalid user: ‘spark-9fcdf4d2-045d-4f3b-9293-0f’

Cause

Both spark.databricks.pyspark.enableProcessIsolation true and spark.databricks.session.share true are set in the Apache Spark configuration on the cluster.

These two Spark properties conflict with each other and prevent the cluster from running Python commands.

Solution

You can only have one of these two Spark properties enabled on your cluster at a time.

You must choose process isolation or a Spark shared session based on your needs. Disable the other option.