Databricks Knowledge Base

Main Navigation

  • Help Center
  • Documentation
  • Knowledge Base
  • Community
  • Training
  • Feedback

Developer tools (GCP)

These articles can help you with the tools you use to develop and manage Databricks applications outside the Databricks environment.

7 Articles in this category

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request. We’ll get back to you as soon as possible.

Please enter the details of your request. A member of our support staff will respond as soon as possible.

  • Home
  • Google Cloud Platform
  • Developer tools (GCP)

Apache Spark session is null in DBConnect

Problem You are trying to run your code using Databricks Connect ( AWS  |  Azure  |  GCP ) when you get a sparkSession is null error message. java.lang.AssertionError: assertion failed: sparkSession is null while trying to executeCollectResult at scala.Predef$.assert(Predef.scala:170) at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(...

Last updated: April 1st, 2022 by Jose Gonzalez

Failed to create process error with Databricks CLI in Windows

Problem While trying to access the Databricks CLI (AWS | Azure | GCP) in Windows, you get a failed to create process error message. Cause This can happen: If multiple instances of the Databricks CLI are installed on the system. If the Python path on your Windows system includes a space. Info There is a known issue in pip which causes pip installed s...

Last updated: May 9th, 2022 by John.Lourdu

GeoSpark undefined function error with DBConnect

Problem You are trying to use the GeoSpark function st_geofromwkt with DBConnect (AWS | Azure | GCP) and you get an Apache Spark error message. Error: org.apache.spark.sql.AnalysisException: Undefined function: 'st_geomfromwkt'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; T...

Last updated: June 1st, 2022 by arjun.kaimaparambilrajan

Get Apache Spark config in DBConnect

You can always view the Spark configuration (AWS | Azure | GCP) for your cluster by reviewing the cluster details in the workspace. If you are using DBConnect (AWS | Azure | GCP) you may want to quickly review the current Spark configuration details without switching over to the workspace UI. This example code shows you how to get the current Spark ...

Last updated: May 9th, 2022 by arvind.ravish

ProtoSerializer stack overflow error in DBConnect

Problem You are using DBConnect (AWS | Azure | GCP) to run a PySpark transformation on a DataFrame with more than 100 columns when you get a stack overflow error. py4j.protocol.Py4JJavaError: An error occurred while calling o945.count. : java.lang.StackOverflowError     at java.lang.Class.getEnclosingMethodInfo(Class.java:1072)     at java.lang.Clas...

Last updated: May 9th, 2022 by ashritha.laxminarayana

Use tcpdump to create pcap files

If you want to analyze the network traffic between nodes on a specific cluster, you can install tcpdump on the cluster and use it to dump the network packet details to pcap files. The pcap files can then be downloaded to a local machine for analysis. Create the tcpdump init script Run this sample script in a notebook on the cluster to create the ini...

Last updated: May 9th, 2022 by pavan.kumarchalamcharla

Terraform registry does not have a provider error

Problem You are installing the Databricks Terraform provider ( AWS | Azure | GCP) and get a Databricks provider registry error. Error while installing hashicorp/databricks: provider registry registry.terraform.io does not have a provider named registry.terraform.io/hashicorp/databricks Cause This error occurs when the required_providers block is not...

Last updated: July 1st, 2022 by prabakar.ammeappin


© Databricks 2022. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Policy | Terms of Use

Definition by Author

0
0