Problem
Your job fails with an error message: A file referenced in the transaction log cannot be found.
Example stack trace:
Error in SQL statement: SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 106, XXX.XXX.XXX.XXX, executor 0): com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/<path>/part-00000-da504c51-3bb4-4406-bb99-3566c0e2f743-c000.snappy.parquet. A file referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement. For more information, see https://docs.databricks.com/delta/delta-intro.html#frequently-asked-questions ... Caused by: java.io.FileNotFoundException: dbfs:/mnt/<path>/part-00000-da504c51-3bb4-4406-bb99-3566c0e2f743-c000.snappy.parquet ...
Cause
There are three common causes for this error message.
- Cause 1: You start the Delta streaming job, but before the streaming job starts processing, the underlying data is deleted.
- Cause 2: You perform updates to the Delta table, but the transaction files are not updated with the latest details.
- Cause 3: You attempt multi-cluster read or update operations on the same Delta table, resulting in a cluster referring to files on a cluster that was deleted and recreated.
Solution
- Cause 1: You should use a new checkpoint directory, or set the Spark property spark.sql.files.ignoreMissingFiles to true in the cluster’s Spark Config.
- Cause 2: Wait for the data to load, then refresh the table. You can also run fsck to update the transaction files with the latest details.
- Cause 3: When tables have been deleted and recreated, the metadata cache in the driver is incorrect. You should not delete a table, you should always overwrite a table. If you do delete a table, you should clear the metadata cache to mitigate the issue. You can use a Python or Scala notebook command to clear the cache.
%python spark._jvm.com.databricks.sql.transaction.tahoe.DeltaLog.clearCache()
%scala com.databricks.sql.transaction.tahoe.DeltaLog.clearCache()