Databricks Knowledge Base

Main Navigation

  • Help Center
  • Documentation
  • Knowledge Base
  • Community
  • Training
  • Feedback

Developer tools

These articles can help you with the tools you use to develop and manage Databricks applications outside the Databricks environment.

9 Articles in this category

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request. We’ll get back to you as soon as possible.

Please enter the details of your request. A member of our support staff will respond as soon as possible.

  • Home
  • All articles
  • Developer tools

Apache Spark session is null in DBConnect

Problem You are trying to run your code using Databricks Connect ( AWS  |  Azure  |  GCP ) when you get a sparkSession is null error message. java.lang.AssertionError: assertion failed: sparkSession is null while trying to executeCollectResult at scala.Predef$.assert(Predef.scala:170) at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(...

Last updated: April 1st, 2022 by Jose Gonzalez

Databricks Connect reports version error with Databricks Runtime 6.4

Problem You are using the Databricks Connect client with Databricks Runtime 6.4 and receive an error message which states that the client does not support the cluster. Caused by: java.lang.IllegalArgumentException: The cluster is running server version `dbr-6.4` but this client only supports Set(dbr-5.5). You can find a list of client releases at ht...

Last updated: May 9th, 2022 by rakesh.parija

Failed to create process error with Databricks CLI in Windows

Problem While trying to access the Databricks CLI (AWS | Azure | GCP) in Windows, you get a failed to create process error message. Cause This can happen: If multiple instances of the Databricks CLI are installed on the system. If the Python path on your Windows system includes a space. Info There is a known issue in pip which causes pip installed s...

Last updated: May 9th, 2022 by John.Lourdu

GeoSpark undefined function error with DBConnect

Problem You are trying to use the GeoSpark function st_geofromwkt with DBConnect (AWS | Azure | GCP) and you get an Apache Spark error message. Error: org.apache.spark.sql.AnalysisException: Undefined function: 'st_geomfromwkt'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; T...

Last updated: June 1st, 2022 by arjun.kaimaparambilrajan

Get Apache Spark config in DBConnect

You can always view the Spark configuration (AWS | Azure | GCP) for your cluster by reviewing the cluster details in the workspace. If you are using DBConnect (AWS | Azure | GCP) you may want to quickly review the current Spark configuration details without switching over to the workspace UI. This example code shows you how to get the current Spark ...

Last updated: May 9th, 2022 by arvind.ravish

How to Sort S3 files By Modification Time in Databricks Notebooks

Problem When you use the dbutils utility to list the files in a S3 location, the S3 files list in random order. However, dbutils doesn’t provide any method to sort the files based on their modification time. dbutils doesn’t list a modification time either. Solution Use the Hadoop filesystem API to sort the S3 files, as shown here: %scala import org....

Last updated: May 9th, 2022 by Adam Pavlacka

Invalid Access Token error when running jobs with Airflow

Problem When you run scheduled Airflow Databricks jobs, you get this error: Invalid Access Token : 403 Forbidden Error Cause To run or schedule Databricks jobs through Airflow, you need to configure the Databricks connection using the Airflow web UI. Any of the following incorrect settings can cause the error: Set the host field to the Databricks wo...

Last updated: May 9th, 2022 by Adam Pavlacka

ProtoSerializer stack overflow error in DBConnect

Problem You are using DBConnect (AWS | Azure | GCP) to run a PySpark transformation on a DataFrame with more than 100 columns when you get a stack overflow error. py4j.protocol.Py4JJavaError: An error occurred while calling o945.count. : java.lang.StackOverflowError     at java.lang.Class.getEnclosingMethodInfo(Class.java:1072)     at java.lang.Clas...

Last updated: May 9th, 2022 by ashritha.laxminarayana

Use tcpdump to create pcap files

If you want to analyze the network traffic between nodes on a specific cluster, you can install tcpdump on the cluster and use it to dump the network packet details to pcap files. The pcap files can then be downloaded to a local machine for analysis. Create the tcpdump init script Run this sample script in a notebook on the cluster to create the ini...

Last updated: May 9th, 2022 by pavan.kumarchalamcharla


© Databricks 2022. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Policy | Terms of Use

Definition by Author

0
0