Problem
While running an Apache Spark job in either a workflow or a notebook, you receive a "Cannot operate on a handle that is closed"
error.
When you check the stack trace, you see the following output.
org.apache.spark.sql.execution.streaming.sources.ForeachBatchUserFuncException: [FOREACH_BATCH_USER_FUNCTION_ERROR] An error occurred in the user provided function in foreach batch sink. Reason: An exception was raised by the Python Proxy
py4j.protocol.Py4JError: An error occurred while calling z:org.apache.spark.sql.functions.expr. Trace:
org.apache.spark.SparkException: Cannot operate on a handle that is closed.
at com.databricks.unity.HandleImpl.assertValid(UCSHandle.scala:98)
at com.databricks.unity.HandleImpl.setupThreadLocals(UCSHandle.scala:116)
at com.databricks.backend.daemon.driver.SparkThreadLocalUtils$$anon$1.run(SparkThreadLocalUtils.scala:48)
at java.lang.Iterable.forEach(Iterable.java:75)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:198)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.lang.Thread.run(Thread.java:750)
Cause
You’re using multithreading to access Unity Catalog (UC) objects in the ForEachBatch
of a streaming job, especially using Python’s ThreadPoolExecutor library.
When the UC credentials are not available to the threads created by ThreadPoolExecutor, the credentials are not passed on to the thread pool threads as they should be, leading to the "Cannot operate on a handle that is closed”
error.
Solution
Databricks does not recommend using ThreadPoolExecutor in Python and ForEachBatch together when accessing UC objects. Instead, consider the following options:
- If you’re using multi-threading for fan-out operations, write the operations using multiple streams instead.
- Move to Scala and use supported thread pool types. You can use the special thread pools in
`org.apache.spark.util.ThreadUtils`
, such as`org.apache.spark.util.ThreadUtils.newDaemonFixedThreadPool`
.
For more information about Scala thread pools in Unity Catalog, refer to the “Limitations” section of the What is Unity Catalog? (AWS | Azure | GCP) documentation.