Databricks Apps - Spark Context initialization fails with Java gateway error

Use the Databricks SQL Connector for Python to query tables in Databricks Apps.

Written by kaushal.vachhani

Last published at: April 16th, 2025

Problem

You are trying to initialize Apache Spark Context in Databricks Apps using getActiveSession() or builder.appName("APP").getOrCreate() to query or pull the data from Databricks tables, but you keep getting Spark Context or Java gateway error messages.

File "/app/python/source_code/.venv/lib/python3.11/site-packages/pyspark/sql/session.py", line 477, in getOrCreate
   sc = SparkContext.getOrCreate(sparkConf)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/python/source_code/.venv/lib/python3.11/site-packages/pyspark/context.py", line 512, in getOrCreate
   SparkContext(conf=conf or SparkConf())
File "/app/python/source_code/.venv/lib/python3.11/site-packages/pyspark/context.py", line 198, in __init__
   SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/app/python/source_code/.venv/lib/python3.11/site-packages/pyspark/context.py", line 432, in _ensure_initialized
   SparkContext._gateway = gateway or launch_gateway(conf)
 ^^^^^^^^^^^^^^^^^^^^
File "/app/python/source_code/.venv/lib/python3.11/site-packages/pyspark/java_gateway.py", line 106, in launch_gateway
   raise RuntimeError("Java gateway process exited before sending its port number")

 

Cause

Databricks Apps do not support Spark Context or JVM-based operations. Attempts to initialize Spark Context within an App fail, leading to the "Java gateway process exited before sending its port number" error. This is expected behavior.

 

Solution

Instead of using Spark Context, you should utilize the Databricks SQL Connector for Python (AWSAzureGCP). This connector enables interaction with Databricks tables using SQL queries via Databricks clusters and SQL warehouses.

For more information, you can also review the Databricks SQL Connector for Python on PyPI documentation.

 

Example code

You must set the following variables before using this code snippet.

  • <databricks-host> is the Server Hostname for your cluster or SQL warehouse.
  • <http-path> is the HTTP Path value for your cluster or SQL warehouse.
  • <access-token> is your Databricks personal access token.

 

Info

You can find the hostname and path values for a cluster in the JDBC/ODBC tab in the Advanced options on a cluster’s configuration page. You can find the same values for a SQL warehouse in the Connection details tab on the warehouse’s configuration page.

 

 

%python

from databricks import sql

connection = sql.connect(
    server_hostname = "<databricks-host>",
    http_path       = "<http-path>",
    access_token    = "<access-token>"
)

cursor = connection.cursor()
cursor.execute("SELECT * FROM my_table LIMIT 10")
results = cursor.fetchall()

for row in results:
    print(row)

cursor.close()
connection.close()