Slow S3 API responses when using Delta Lake with versioning-enabled S3 buckets

Disable S3 bucket versioning.

Written by Raghavan Vaidhyaraman

Last published at: January 17th, 2025

Problem

While using Delta Lake on AWS S3 buckets with versioning enabled, you notice slower S3 API responses and increased storage costs.

 

Cause

When Delta Lake performs VACUUM operations to remove obsolete files, these files become stale but are not entirely deleted when versioning is enabled. Instead, S3 retains them as noncurrent versions. Over time, the number of noncurrent object versions accumulates, leading to a bloated storage system with many unnecessary file versions.

 

Solution

Databricks recommends disabling S3 bucket versioning. Use the put-bucket-versioning command. 
 

$ aws s3api put-bucket-versioning \
    --profile <your-profile> \
    --bucket <your-bucket> \
    --versioning-configuration Status=Suspended \
    --endpoint https://<your-endpointurl>.com

 

If you need to keep versioning, implement a lifecycle management policy specifying a short period, such as seven days or less, to retain noncurrent object versions. Databricks recommends retaining no more than three versions of an object.

 

Example JSON to implement a lifecycle management policy

aws s3api put-bucket-lifecycle-configuration \
  --bucket <your-bucket-name> \
  --lifecycle-configuration '{
    "Rules": [
      {
        "ID": "LimitNumberOfVersions",
        "Status": "Enabled",
        "Filter": {
          "Prefix": ""
        },
        "NoncurrentVersionExpiration": {
          "NoncurrentDays": 7,
          "NewerNoncurrentVersions": 3
        }
      }
    ]
  }'