Error FileNotFoundException while streaming job or reading Delta table even with ignoremissingfiles set

Use the FSCK repair command to synchronize the metadata with the actual data files.

Written by mounika.tarigopula

Last published at: January 16th, 2025

Problem

While running a streaming job or reading a Delta table, you receive an error even though you have set ignoremissingfiles
 

Job aborted due to stage failure: Error while reading file dbfs:/mnt/...snappy.parquet.
Caused by: IOException: java.io.FileNotFoundException: Operation failed: 'The specified path does not exist.', 404, GET

 

Cause

There is a discrepancy between the metadata and the data files. 

It is also possible that your Delta logs have stale metadata entries that reference files no longer in the storage location.

 

Solution

Use the FSCK REPAIR TABLE command to synchronize the metadata with the data files. This command removes metadata entries for files that are not present in the underlying file system. Execute the following command in your Databricks notebook.

 

FSCK REPAIR TABLE delta.`<your-table>`

 

Further, ensure all files referenced in the Delta logs are present in the storage location. Manually check the storage directory or use automated scripts to verify file existence.