Problem
You are trying to optimize a Delta table by Z-Ordering and receive an error about not collecting stats for the columns.
AnalysisException: Z-Ordering on [col1, col2] will be ineffective, because we currently do not collect stats for these columns.
Cause
Delta Lake collects statistics on the first 32 columns defined in your table schema. If the columns you are attempting to Z-Order are not in the first 32 columns, no statistics are collected for those columns.
Solution
Reorder the columns in your table, so the columns you are attempting to Z-Order are in the first 32 columns in your table.
You can use an ALTER TABLE statement to reorder the columns.
%sql ALTER TABLE table_name CHANGE [COLUMN] col_name col_name data_type [COMMENT col_comment] [FIRST|AFTER colA_name]
For example, this statement brings the column with <column-name> to the first column in the table.
%sql ALTER TABLE <delta-table-name> CHANGE COLUMN <column-name> <column-name> <data-type> FIRST
Recompute the statistics after you have reordered the columns in the table.
%scala import com.databricks.sql.transaction.tahoe._ import org.apache.spark.sql.catalyst.TableIdentifier import com.databricks.sql.transaction.tahoe.stats.StatisticsCollection val tableName = "<name of table>" val deltaLog = DeltaLog.forTable(spark, TableIdentifier(tableName)) StatisticsCollection.recompute(spark, deltaLog)
Rerun the Z-Order on the table and it should complete successfully.