DeltaFileNotFoundException when reading a table

Ensure that Delta log files are not getting deleted prematurely.

Written by lucas.rocha

Last published at: January 14th, 2025

Problem

You are trying to read a Delta table when you get a DeltaFileNotFoundException error.

com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: dbfs:/VolumesOrMount/path/to/deltaTable/_delta_log/00000000000000000000.json: Unable to reconstruct state at version {version} as the transaction log has been truncated due to manual deletion or the log retention policy (delta.logRetentionDuration=30 days) and checkpoint retention policy (delta.checkpointRetentionDuration=2 days)

Cause

DeltaFileNotFoundException occurs when Delta Lake cannot access the transaction log files or checkpoint files required to reconstruct the state of the table at a specified version.

This can be caused by manual deletion of log files, retention policies, or storage lifecycle policies.

Manual deletion of log files

You have manually deleted files in the _delta_log directory, either intentionally or unintentionally, resulting in an incomplete transaction log history.

Retention policies

Delta Lake enforces retention policies to manage the size of the transaction logs and checkpoint files: 

  • Checkpoint Retention (delta.checkpointRetentionDuration): Default is 2 days.
  • Transaction Log Retention (delta.logRetentionDuration): Default is 30 days.

If the requested version exceeds the retention period, the files may no longer exist, which results in a DeltaFileNotFoundException error.

Storage lifecycle policies

Your underlying cloud storage service may have lifecycle policies that delete or archive files before the Delta retention policies expire. As a result, the requested files no longer exist.

Solution

To resolve this issue, you can either recreate the table or attempt to recover the missing files.

Info

If you have S3 Versioning enabled on AWS, Soft Delete enabled on Azure, Soft Delete enabled on GCP, or a similar backup mechanism that periodically saves a copy of the files, you should be able to recover your files.

 

To prevent this issue from occurring, you should take steps to prevent manual deletion of files in the _delta_log directory. The log files are important for maintaining table consistency.

You can also increase the retention duration if you regularly require access to older versions or perform historical queries.

Finally, you should review the storage policies on your underlying cloud storage account to ensure files are not deleted prematurely. Your cloud storage settings should align with your Delta Lake retention policy settings.