Updated September 12th, 2024 by lakshay.goel

Column drift when reading multiple delimited files

Problem  You notice column drift while reading multiple delimited files in a single  spark.read operation. This problem manifests as columns being incorrectly mapped, leading to data integrity issues.  Example  spark.read.format("csv").load(<source_directory>/*) Where  source_directory contains multiple CSV files. Cause When multiple files wit...

0 min reading time
Updated September 10th, 2024 by lakshay.goel

Increased wait times between micro-batches in Auto Loader

Problem  When running an Auto Loader job in directory listing mode, you may experience increased wait time between micro-batches.  Cause When the input file path is a nested directory path, the job takes time to list all the nested directories. Thus, the job has to wait for worker threads to make progress before processing the next batch, leading to...

0 min reading time
Updated August 29th, 2024 by lakshay.goel

'CREATE OR REPLACE' SQL error in a Delta table

Problem  When trying to run the CREATE or REPLACE statement against a Delta table, you may encounter the following issue:  [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view <catalog>.<schema>.<table_name> because it already exists. Choose a different name, drop or replace the existing object, add the IF NOT EXISTS clause t...

0 min reading time
Load More