Problem
You are trying to read a Delta table when you get a DeltaFileNotFoundException
error.
com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: dbfs:/VolumesOrMount/path/to/deltaTable/_delta_log/00000000000000000000.json: Unable to reconstruct state at version {version} as the transaction log has been truncated due to manual deletion or the log retention policy (delta.logRetentionDuration=30 days) and checkpoint retention policy (delta.checkpointRetentionDuration=2 days)
Cause
DeltaFileNotFoundException
occurs when Delta Lake cannot access the transaction log files or checkpoint files required to reconstruct the state of the table at a specified version.
This can be caused by manual deletion of log files, retention policies, or storage lifecycle policies.
Manual deletion of log files
You have manually deleted files in the _delta_log
directory, either intentionally or unintentionally, resulting in an incomplete transaction log history.
Retention policies
Delta Lake enforces retention policies to manage the size of the transaction logs and checkpoint files:
- Checkpoint Retention (
delta.checkpointRetentionDuration
): Default is 2 days. - Transaction Log Retention (
delta.logRetentionDuration
): Default is 30 days.
If the requested version exceeds the retention period, the files may no longer exist, which results in a DeltaFileNotFoundException
error.
Storage lifecycle policies
Your underlying cloud storage service may have lifecycle policies that delete or archive files before the Delta retention policies expire. As a result, the requested files no longer exist.
Solution
To resolve this issue, you can either recreate the table or attempt to recover the missing files.
Info
If you have S3 Versioning enabled on AWS, Soft Delete enabled on Azure, Soft Delete enabled on GCP, or a similar backup mechanism that periodically saves a copy of the files, you should be able to recover your files.
To prevent this issue from occurring, you should take steps to prevent manual deletion of files in the _delta_log
directory. The log files are important for maintaining table consistency.
You can also increase the retention duration if you regularly require access to older versions or perform historical queries.
Finally, you should review the storage policies on your underlying cloud storage account to ensure files are not deleted prematurely. Your cloud storage settings should align with your Delta Lake retention policy settings.