You do not use deletion vectors, but see a file named deletion vector in your data path

It is an artifact of low shuffle merge and is removed on the next VACUUM run.

Written by avi.yehuda

Last published at: December 13th, 2024

Problem

You are reviewing your data path and you come across a file with "deletion vector" in its name and a .bin extension. Because of the name, you may assume it is a Delta Lake deletion vector (AWSAzureGCP), but this would be incorrect. This file can be created, even if you do not use Delta Lake deletion vectors.

dbutils.fs.ls("s3://bucket_name/table_name/")

[FileInfo(path='s3://bucket_name/table_name/deletion_vector_1112222-33333-44444-5555-123455.bin', deletion_vector_1112222-33333-44444-5555-123455.bin', size=1024),
 FileInfo(path='s3://bucket_name/table_name/date=delta_log', name='date=delta_log/', size=0),
 FileInfo(path='s3://bucket_name/table_name/date=20241010', name='date=20241010/', size=0),
 ]

Cause

This file is generated by an internal process called low shuffle merge (AWSAzureGCP). Introduced in Databricks Runtime 10.4 LTS, low shuffle merge uses a type of deletion vector mechanism.

The file in question is a temporary file created during a merge operation. Typically, this file is deleted once the merge operation completes. If the merge operation fails, this file may remain in place.

Solution

You do not have to do anything. The file is automatically deleted on the next VACUUM run.