Cannot access Apache SparkContext object using addPyFile

Leverage the addArtifact API instead.

Written by Raghavan Vaidhyaraman

Last published at: January 17th, 2025

Problem

While using Databricks Connect with VSCode, you notice you can’t directly access the Apache SparkContext object using the addPyFile API to add simple files. The error stack trace indicates an attribute is not supported.

 

[JVM_ATTRIBUTE_NOT_SUPPORTED] Directly accessing the underlying Spark driver JVM using the attribute 'sparkContext' is not supported on shared clusters.

 

Cause

SparkContext is not available when using Databricks Connect for addPyFile API on a cluster in Databricks Runtime 14.0 and above. 

 

Solution

Leverage the addArtifact API to add simple files instead. Start by setting up your Spark Connect session.

 

Example in Python

After setting up your Spark Connect session, upload files. Various filetypes are included in the following code. 

 

from pyspark.sql import SparkSession

# 1. Set up Spark Connect session
spark = SparkSession.builder().remote("sc://localhost").build()

# 2. Upload files

# Arbitrary files
spark.addArtifact(<path-to-your-files>, file=True)

# .py, .egg, .zip or .jar, automatically added into PYTHONPATH
spark.addArtifact(<path-to-your-py-files>, pyfile=True)

# .zip, .jar, .tar.gz, .tgz, or .tar, automatically untarred in current working directory of UDF execution
spark.addArtifact(<path-to-py-archives>, archive=True)

 

 

Example in Scala

After setting up your Spark Connect session, register a ClassFinder to monitor and upload the class files from the build output, then upload your JAR dependencies. 

 

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.connect.client.REPLClassDirMonitor

// 1. Set up Spark Connect session
val spark = SparkSession.builder().remote("sc://localhost").build()

// 2. Register a ClassFinder to monitor and upload the classfiles from the build output.
val classFinder = new REPLClassDirMonitor(<your-absolute-path-to-build-output-dir>)
spark.registerClassFinder(classFinder)

// 3. Upload JAR dependencies
spark.addArtifact(<your-absolute-path-to-jar-dep>)

 

For more information, refer to the pyspark.sql.SparkSession.addArtifact documentation.