Updated May 24th, 2022 by DD Sharma

Data is incorrect when read from Snowflake

Problem You have a job that is using Apache Spark to read from a Snowflake table, but the time data that appears in the Dataframe is incorrect. If you run the same query directly on Snowflake, the correct time data is returned. Cause The time zone value was not correctly set. A mismatch between the time zone value of the Databricks cluster and Snowf...

0 min reading time
Updated December 1st, 2022 by DD Sharma

Get last modification time for all files in Auto Loader and batch jobs

You are running a streaming job with Auto Loader (AWS | Azure | GCP) and want to get the last modification time for each file from the storage account. Instructions The Get the path of files consumed by Auto Loader article describes how to get the filenames and paths for all files consumed by the Auto Loader. In this article, we build on that founda...

1 min reading time
Updated May 10th, 2022 by DD Sharma

Unable to cast string to varchar

Problem You are trying to cast a string type column to varchar but it isn’t working. Info The varchar data type (AWS | Azure | GCP) is available in Databricks Runtime 8.0 and above. Create a simple Delta table, with one column as type string.%sql CREATE OR REPLACE TABLE delta_table1 (`col1` string) USING DELTA; Use SHOW TABLE on the newly created ta...

0 min reading time
Updated October 7th, 2022 by DD Sharma

Vaccuming with zero retention results in data loss

Problem You add data to a Delta table, but the data disappears without warning. There is no obvious error message. Cause This can happen when spark.databricks.delta.retentionDurationCheck.enabled is set to false and VACUUM is configured to retain 0 hours. %sql VACUUM <name-of-delta-table> RETAIN 0 HOURS OR %sql VACUUM delta.`<delta_table_pa...

1 min reading time
Load More