ZORDER results in "Hilbert indexing can only be used on 9 or fewer columns" error

OPTIMIZE ZORDER BY command has a hard limit of nine columns.

Written by emad.rizkallah

Last published at: March 15th, 2023

Problem

You are running an OPTIMIZE ZORDER BY (AWS | Azure | GCP) command in Databricks SQL (AWS | Azure | GCP) when you get an Apache Spark exception error: Hilbert indexing can only be used on 9 or fewer columns.

Error in SQL statement: ExecutionException: org.apache.spark.SparkException: Hilbert indexing can only be used on 9 or fewer columns

Cause

OPTIMIZE ZORDER BY command has a hard limit of nine columns. This is by design.

Solution

You must reduce the number of columns to nine or less. The best practice is to use ZORDER on a maximum of three columns. When you use ZORDER on four or more columns, the effectiveness is reduced with each additional column used.

Use ZORDER on the most commonly used query predicate columns (the columns in the query "where" clause). Databricks recommends that you use ZORDER on high cardinality columns. 

Delete

Info

Delta Lake on Databricks collects statistics on the first 32 columns defined in your table schema.