Problem
When working with Delta Lake tables you encounter an error message.
"java.lang.IllegalStateException: Versions (Vector(0, 3)) are not contiguous" typically occurs when there's a lack of continuity in the Delta log files, which can happen when files have been manually removed or due to S3 eventual consistency when a table is deleted and recreated at the same location.
Cause
The Delta table is corrupted. Corrupted tables can occur if you:
- Manually remove underlying files from the Delta log.
- Run rm commands or other non-Delta operations that remove files from the Delta log.
- Drop and immediately create a table on top of the same location.
When files are manually removed or not removed correctly, the Delta log versions become non-contiguous.
Solution
- Use the following Scala code to get the corrupted table storage location.
%scala
val metastore = spark.sharedState.externalCatalog
val location = metastore.getTable("<database-name>", "<table-name>").location
- Remove the table base folder.
%sh
rm -r /dbfs/<table-storage-location>
- Drop the table.
%sql
drop table <table-name>
- Use
CREATE OR REPLACE TABLE
to recreate the table if the underlying location remains the same. For more information, please review theCREATE TABLE [USING]
(AWS | Azure | GCP) documentation.
Alternatively, if you only have append operations (without OPTIMIZE
) on your Delta table, retrieve the Delta table without dropping it and convert it.
- Remove the
_delta_log
folder. Without_delta_log
, this will be treated as a parquet table.
%sh
rm -r /dbfs/<corrupt-table-name>/_delta_log
- Convert this parquet table to Delta table. The following command will create a fresh
_delta_log
folder so the table can be queryable without losing the data.
%sql
CONVERT TO DELTA parquet.`<table-path>/` PARTITIONED BY (year string) [if table is partitioned use PARTITIONED BY];
For more information on best practices for dropping Delta Lake tables, please review the Best practices for dropping a managed Delta Lake table knowledge base article.