Job fails with "not enough memory to build the hash map" error
Info This article applies to Databricks Runtime 11.3 LTS and above. Problem You are running SparkSQL/PySpark code which uses broadcast hints. It takes longer to run than on previous Databricks Runtimes and/or fails with an out of memory error message. Example code: df.join(broadcast(bigDf)).write.mode('overwrite').parquet("path") Error message: Job ...
1 min reading timeANSI compliant DECIMAL precision and scale
Problem You are trying to cast a value of one or greater as a DECIMAL using equal values for both precision and scale. A null value is returned instead of the expected value. This sample code: %sql SELECT CAST (5.345 AS DECIMAL(20,20)) Returns: Cause The DECIMAL type (AWS | Azure | GCP) is declared as DECIMAL(precision, scale), where precision and s...
1 min reading timeJSON reader parses values as null
Problem You are attempting to read a JSON file. You know the file has data in it, but the Apache Spark JSON reader is returning a null value. Example code You can use this example code to reproduce the problem. Create a test JSON file in DBFS.%python dbutils.fs.rm("dbfs:/tmp/json/parse_test.txt") dbutils.fs.put("dbfs:/tmp/json/parse_test.txt", """ {...
0 min reading timeCannot import timestamp_millis or unix_millis
Problem You are trying to import timestamp_millis or unix_millis into a Scala notebook, but get an error message. %scala import org.apache.spark.sql.functions.{timestamp_millis, unix_millis} error: value timestamp_millis is not a member of object org.apache.spark.sql.functions import org.apache.spark.sql.functions.{timestamp_millis, unix_millis} Cau...
0 min reading timeCannot view table SerDe properties
Problem You are trying to view the SerDe properties on an Apache Hive table, but SHOW CREATE TABLE just returns the Apache Spark DDL. It does not show the SerDe properties. For example, given this sample code: %sql SHOW CREATE TABLE <table-identifier> You get a result that does not show the SerDe properties: Cause You are using Databricks Runt...
0 min reading timeUse custom classes and objects in a schema
Problem You are trying to create a dataset using a schema that contains Scala enumeration fields (classes and objects). When you run your code in a notebook cell, you get a ClassNotFoundException error. Sample code %scala object TestEnum extends Enumeration { type TestEnum = Value val E1, E2, E3 = Value } import spark.implicits._ import TestEnum._ c...
1 min reading time