Updated October 14th, 2022 by deepak.bhutada

Different tables with same data generate different plans when used in same query

Problem Assume you have two Delta tables test_table_1 and test_table_2. Both tables have the same schema, same data volume, same partitions, and contain the same number of files. You are doing a join transformation with another Delta table, test_table_join, which has a million records. When you run the below join queries using test_table_1 and test_...

1 min reading time
Updated October 26th, 2022 by deepak.bhutada

Using datetime values in Spark 3.0 and above

Problem You are migrating jobs from unsupported clusters running Databricks Runtime 6.6 and below with Apache Spark 2.4.5 and below to clusters running a current version of the Databricks Runtime. If your jobs and/or notebooks process date conversions, they may fail with a SparkUpgradeException error message after running them on upgraded clusters. ...

1 min reading time
Updated July 28th, 2023 by deepak.bhutada

Found duplicate columns error blocks creation of a Delta table

Problem You have an array of struct columns with one or more duplicate column names in a DataFrame. If you try to create a Delta table you get a Found duplicate column(s) in the data to save: error. Example code You can reproduce the error with this example code. 1) The first step sets up an array with duplicate column names. The duplicate columns a...

1 min reading time
Load More