Apache Spark session is null in DBConnect

A `sparkSession is null while trying to executeCollectResult` error message occurs when using DBConnect.

Written by Jose Gonzalez

Last published at: April 1st, 2022

Problem

You are trying to run your code using Databricks Connect (AWS | Azure | GCP) when you get a sparkSession is null error message.

java.lang.AssertionError: assertion failed: sparkSession is null while trying to executeCollectResult
at scala.Predef$.assert(Predef.scala:170)
at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:323)
at org.apache.spark.sql.Dataset$$anonfun$50.apply(Dataset.scala:3351)
at org.apache.spark.sql.Dataset$$anonfun$50.apply(Dataset.scala:3350)
at org.apache.spark.sql.Dataset$$anonfun$54.apply(Dataset.scala:3485)
at org.apache.spark.sql.Dataset$$anonfun$54.apply(Dataset.scala:3480)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:111)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:240)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:97)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:170)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3480)
at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3350)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)

Cause

You get the sparkSession is null error message if a Spark session is not active on your cluster when you try to run your code using DBConnect.

Solution

You must ensure that a Spark session is active on your cluster before you attempt to run your code locally using DBConnect.

You can use the following Python example code to check for a Spark session and create one if it does not exist.

%python

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
Delete

Warning

DBConnect only works with supported Databricks Runtime versions. Ensure that you are using a supported runtime on your cluster before using DBConnect.