Problem
After you run VACUUM
successfully, you notice you can still access and query data further back than the default seven days of retained history with time travel.
Cause
When you run VACUUM
, it removes stale files from the file system but it may not remove all files immediately. When there are still active files at a particular version, time travel can still access those files even though they are beyond the expected retention period.
Solution
This is expected VACUUM
behavior. To be sure VACUUM
did execute when you ran the command, you can test by reading a table’s state as of "<older-date>"
and writing it to a temporary table using the following query.
INSERT OVERWRITE <your-temp-table> AS SELECT * FROM <catalog>.<schema>.<table-name> TIMESTAMP AS OF '<older-date>'
The query should fail, because it tries to read every Parquet data file at that state, and if some files are already removed by VACUUM
, it can’t read those files.
Additionally, you can check the VACUUM
history to see if it has indeed deleted the stale files from storage as expected.
For more information, review the Work with Delta Lake table history (AWS | Azure | GCP) and VACUUM
(AWS | Azure | GCP) documentation.