MAX() query on Delta tables performs slowly on string or timestamp columns

Enable metadata query optimization for clustered Delta tables using an Apache Spark configuration.

Written by nelavelli.durganagajahnavi

Last published at: July 25th, 2025

Problem

You observe queries using the MAX() function on Delta tables with string or timestamp columns perform slowly.

Cause

Metadata optimizations are not applied for string or timestamp data types by default, leading to full table scans instead of leveraging Delta Lake statistics for performance improvements.

Additional context

Delta Lake uses collected statistics to optimize queries by skipping data based on min/max values. However, for string and timestamp columns, Delta collects statistics only on the first 32 characters (for strings), which may not represent the full range of values in those columns.

Consequently, the Delta Lake optimizer skips these columns when evaluating whether to perform metadata-based query optimizations (like data skipping or statistics pruning).

Solution

Enable metadata query optimizations for these column types by setting the following Apache Spark configuration. Run the following code in a notebook.

SET spark.databricks.delta.optimizeMetadataQuery.clusteredTable.enabled = TRUE;

This setting allows the Delta optimizer to use file-level metadata for string or timestamp columns when the table is clustered, even though full statistics are not available.

Databricks Help Center

Problem

Cause

Additional context

Solution

Contact Us