Scala with Apache Spark (GCP)

Apache Spark UI is not in sync with job

Status of Spark jobs gets out of sync with the Spark UI when events drop from the event queue before being processed....

Last updated: June 17th, 2024 by chetan.kardekar

Apache Spark job fails with Parquet column cannot be converted error

Parquet column cannot be converted error appears when you are reading decimal data in Parquet format and writing to a Delta table....

Last updated: May 20th, 2022 by shanmugavel.chandrakasu

Cannot import timestamp_millis or unix_millis

Cannot use timestamp_millis or unix_millis directly with a DataFrame. You must first use selectExpr() or use SQL commands....

Last updated: May 20th, 2022 by saritha.shivakumar

Cannot modify the value of an Apache Spark config

You cannot modify the value of a Spark config setting within a notebook. It must be set at the cluster level....

Last updated: May 20th, 2022 by Adam Pavlacka

Convert nested JSON to a flattened DataFrame

How to convert a flattened DataFrame to nested JSON using a nested case class....

Last updated: May 20th, 2022 by Adam Pavlacka

Decimal$DecimalIsFractional assertion error

Using `round()` or casing a double to decimal results in a `Decimal$DecimalIsFractional` assertion error. java.lang.AssertionError assertion failed...

Last updated: May 23rd, 2022 by saikrishna.pujari

from_json returns null in Apache Spark 3.0

Spark 3.0 and above cannot parse JSON arrays as structs; from_json returns null....

Last updated: May 23rd, 2022 by shanmugavel.chandrakasu

Manage the size of Delta tables

Recommendations that can help you manage the size of your Delta tables....

Last updated: May 23rd, 2022 by Jose Gonzalez

Select files using a pattern match

Use a glob pattern match to select specific files in a folder....

Last updated: May 23rd, 2022 by mathan.pillai

Job fails with ExecutorLostFailure due to “Out of memory” error

Resolve executor failures where the root cause is due to the executor running out of memory.....

Last updated: November 7th, 2022 by mathan.pillai

Job fails with ExecutorLostFailure because executor is busy

Resolve executor failures where the root cause is due to the executor being busy....

Last updated: November 7th, 2022 by mathan.pillai

Understanding speculative execution

Learn how speculative execution works, how to identify it, and when you should use it....

Last updated: November 7th, 2022 by mounika.tarigopula

Use custom classes and objects in a schema

You must define custom classes and objects inside a package if you want to use them in a notebook. ...

Last updated: November 8th, 2022 by saritha.shivakumar

Jobs fails with a TimeoutException error

This error is usually caused by a Broadcast join that takes excessively long to complete....

Last updated: March 3rd, 2023 by swetha.nandajan

Sort failed after writing partitioned data to parquet using PySpark on Databricks Runtime 13.3 LTS

Set the Apache Spark configuration to set the sorted data after writing partitioned data to parquet....

Last updated: October 23rd, 2024 by mounika.tarigopula

Reading Avro files with Structured Streaming using wildcards in the path fails with error ArrayIndexOutOfBoundsException

Add an option to enable recursively reading bulk Avro files using a wildcard path....

Last updated: October 23rd, 2024 by mounika.tarigopula

WithColumn operation when using in-loop slows performance

Use the select operator instead....

Last updated: November 6th, 2024 by kaushal.vachhani

Extract timestamps with precision up to nano seconds from a long column

Create a UDF to extract the nanoseconds from the LongType....

Last updated: April 26th, 2025 by G Yashwanth Kiran

Apache Spark job failing with GC overhead limit exceeded error

Analyze JOIN columns and deduplicate JOIN keys....

Last updated: April 30th, 2025 by nelavelli.durganagajahnavi

Databricks Help Center