Auto Loader failures with java.io.FileNotFoundException for SST and log files

Use a separate checkpoint folder outside of the Delta directory.

Written by kuldeep.mishra

Last published at: November 4th, 2024

Problem

You notice that jobs which previously ran successfully start failing without any recent changes to the code or environment. You discover subsequently that pipelines using Auto Loader to ingest Delta changes for multiple tables fail suddenly with an error message, java.io.FileNotFoundException, for SST and log files. 

 

Cause

Your table’s checkpoint and Delta paths are misconfigured. When the VACUUM command is executed on the table, it deletes all files in the directory that are not tracked by _delta_log, including the checkpoint files. This can happen if the checkpoint path is set to the same location as the Delta path of the table.

The issue can also occur if multiple streams or jobs use the same checkpoint directory, which leads to conflicts and file deletions that can cause pipeline failures.

 

Solution

First, create a separate checkpoint folder outside of the Delta directory. This ensures that the checkpoint files are not deleted during the VACUUM operation. 

Next, for successful runs, copy the checkpoint files to this new path. For failed runs, start with a new checkpoint and use the modifiedAfter option to the stream to ingest files that have a modification timestamp after a specific timestamp. 

For more detail on the modifiedAfter option, refer to the Auto Loader options (AWSAzureGCP) documentation.

Last, run a VACUUM DRY RUN to validate which files will be deleted in the next vacuum execution. This helps ensure that no necessary files are deleted.