DELTA_CLUSTERING_COLUMN_MISSING_STATS error when attempting to define liquid clustering for a delta table

Ensure that you have generated Delta statistics for the columns used as clustering keys.

Written by jessica.santos

Last published at: December 11th, 2024

Problem

You’re attempting to run an SQL command to enable liquid clustering on your existing Delta table.

 

ALTER TABLE <table_name> CLUSTER BY (<clustering_columns>)

 

The command results in an error. 

 

[DELTA_CLUSTERING_COLUMN_MISSING_STATS] Liquid clustering requires clustering columns to have stats. Couldn't find clustering column(s) <clustering_columns>

 

Cause

The table is missing Delta statistics for the specific set of columns <clustering_columns> used in the CLUSTER BY clause.

 

Solution

To generate Delta statistics, run the following ANALYZE command. This command generates Delta statistics on the first 32 columns defined in your table schema. 

 

ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS

 

To select specific columns (column1,column2,column3) for Delta statistics generation, and to reduce the ANALYZE execution time, set the following configuration before running ANALYZE. This set can also be the same as the clustering keys.   

 

ALTER TABLE <table_name>
SET TBLPROPERTIES (
  'delta.dataSkippingStatsColumns' = 'column1,column2,column3'
);