Recover from a DELTA_LOG corruption error
Problem You are attempting to query a Delta table when you get an IllegalStateException error saying that the metadata could not be recovered. Error in SQL statement: IllegalStateException: The metadata of your Delta table couldn't be recovered while Reconstructing version: 691193. Did you manually delete files in the _delta_log directory? Set spar...
2 min reading timeHandling WARN Message: 'Could not turn on CDF for table (table-name)' in Delta Live Tables Pipeline
Problem While running the Databricks Delta Live Tables (DLT) pipeline, you encounter a WARN message in DLT event logs. Could not turn on CDF for table <table-name>. The table contains reserved columns [_change_type, _commit_version, _commit_timestamp] that will be used internally as metadata for the table's Change Data Feed. Change Data Feed ...
0 min reading timeStreaming application missing data from a Delta table when writing to a given destination
Problem When using a streaming application to stream data from a Delta table and write to a given destination, you notice data loss. Cause In trying to separately address a failed streaming job by using startingVersion=latest , the tradeoff is possible data loss. The restarted query will read only from the latest available Delta version of the so...
0 min reading timeStructured streaming jobs slow down on every 10th batch
Problem You are running a series of structured streaming jobs and writing to a file sink. Every 10th run appears to run slower than the previous jobs. Cause The file sink creates a _spark_metadata folder in the target path. This metadata folder stores information about each batch, including which files are part of the batch. This is required to prov...
1 min reading timeUncommitted files causing data duplication
Problem You had a network issue (or similar) while a write operation was in progress. You are rerunning the job, but partially uncommitted files during the failed run are causing unwanted data duplication. Cause How Databricks commit protocol works: The DBIO commit protocol (AWS | Azure | GCP) is transactional. Files are only committed after a trans...
1 min reading time