Problem
You’re trying to broadcast a variable on a shared access mode cluster and receive error messages such as BROADCAST_VARIABLE_NOT_LOADED
or JVM_ATTRIBUTE_NOT_SUPPORTED
.
Cause
Databricks shared access mode clusters do not support broadcast variables due to their enhanced isolation architecture. Trying to use broadcast variables will lead to the error BROADCAST_VARIABLE_NOT_LOADED
.
If you are using shared clusters in Databricks Runtime 14.0 and above, you see JVM_ATTRIBUTE_NOT_SUPPORTED
in PySpark or value sparkContext is not a member of org.apache.spark.sql.SparkSession
in Scala.
Solution
If you need to use broadcast variables, Databricks recommends using single-user clusters for such workloads. This will allow you to bypass the isolation and enable the use of broadcast variables.
If you prefer to continue using a shared cluster, pass the variables into functions as a state parameter instead of using broadcast variables.
For more information on shared access mode limitations, please refer to the Compute access mode limitations for Unity Catalog (AWS | Azure | GCP) documentation.