Problem
You are reviewing your data path and you come across a file with "deletion vector" in its name and a .bin extension. Because of the name, you may assume it is a Delta Lake deletion vector (AWS | Azure | GCP), but this would be incorrect. This file can be created, even if you do not use Delta Lake deletion vectors.
dbutils.fs.ls("s3://bucket_name/table_name/")
[FileInfo(path='s3://bucket_name/table_name/deletion_vector_1112222-33333-44444-5555-123455.bin', deletion_vector_1112222-33333-44444-5555-123455.bin', size=1024),
FileInfo(path='s3://bucket_name/table_name/date=delta_log', name='date=delta_log/', size=0),
FileInfo(path='s3://bucket_name/table_name/date=20241010', name='date=20241010/', size=0),
]
Cause
This file is generated by an internal process called low shuffle merge (AWS | Azure | GCP). Introduced in Databricks Runtime 10.4 LTS, low shuffle merge uses a type of deletion vector mechanism.
The file in question is a temporary file created during a merge operation. Typically, this file is deleted once the merge operation completes. If the merge operation fails, this file may remain in place.
Solution
You do not have to do anything. The file is automatically deleted on the next VACUUM
run.