Updated May 16th, 2022 by saritha.shivakumar

JSON reader parses values as null

Problem You are attempting to read a JSON file. You know the file has data in it, but the Apache Spark JSON reader is returning a null value. Example code You can use this example code to reproduce the problem. Create a test JSON file in DBFS.%python dbutils.fs.rm("dbfs:/tmp/json/parse_test.txt") dbutils.fs.put("dbfs:/tmp/json/parse_test.txt", """ {...

0 min reading time
Updated May 20th, 2022 by saritha.shivakumar

Cannot import timestamp_millis or unix_millis

Problem You are trying to import timestamp_millis or unix_millis into a Scala notebook, but get an error message. %scala import org.apache.spark.sql.functions.{timestamp_millis, unix_millis} error: value timestamp_millis is not a member of object org.apache.spark.sql.functions import org.apache.spark.sql.functions.{timestamp_millis, unix_millis} Cau...

0 min reading time
Updated July 1st, 2022 by saritha.shivakumar

Cannot view table SerDe properties

Problem You are trying to view the SerDe properties on an Apache Hive table, but SHOW CREATE TABLE just returns the Apache Spark DDL. It does not show the SerDe properties. For example, given this sample code: %sql SHOW CREATE TABLE <table-identifier> You get a result that does not show the SerDe properties: Cause You are using Databricks Runt...

0 min reading time
Updated October 29th, 2022 by saritha.shivakumar

ANSI compliant DECIMAL precision and scale

Problem You are trying to cast a value of one or greater as a DECIMAL using equal values for both precision and scale. A null value is returned instead of the expected value. This sample code: %sql SELECT CAST (5.345 AS DECIMAL(20,20)) Returns: Cause The DECIMAL type (AWS | Azure | GCP) is declared as DECIMAL(precision, scale), where precision and s...

1 min reading time
Updated November 8th, 2022 by saritha.shivakumar

Use custom classes and objects in a schema

Problem You are trying to create a dataset using a schema that contains Scala enumeration fields (classes and objects). When you run your code in a notebook cell, you get a ClassNotFoundException error. Sample code %scala object TestEnum extends Enumeration { type TestEnum = Value val E1, E2, E3 = Value } import spark.implicits._ import TestEnum._ c...

1 min reading time
Updated May 12th, 2023 by saritha.shivakumar

Job fails with "not enough memory to build the hash map" error

Info This article applies to Databricks Runtime 11.3 LTS and above. Problem You are running SparkSQL/PySpark code which uses broadcast hints. It takes longer to run than on previous Databricks Runtimes and/or fails with an out of memory error message. Example code: df.join(broadcast(bigDf)).write.mode('overwrite').parquet("path") Error message: Job ...

1 min reading time
Load More