Problem
While running a streaming job or reading a Delta table, you receive an error even though you have set ignoremissingfiles
.
Job aborted due to stage failure: Error while reading file dbfs:/mnt/...snappy.parquet.
Caused by: IOException: java.io.FileNotFoundException: Operation failed: 'The specified path does not exist.', 404, GET
Cause
There is a discrepancy between the metadata and the data files.
It is also possible that your Delta logs have stale metadata entries that reference files no longer in the storage location.
Solution
Use the FSCK REPAIR TABLE
command to synchronize the metadata with the data files. This command removes metadata entries for files that are not present in the underlying file system. Execute the following command in your Databricks notebook.
FSCK REPAIR TABLE delta.`<your-table>`
Further, ensure all files referenced in the Delta logs are present in the storage location. Manually check the storage directory or use automated scripts to verify file existence.