Problem
You want to partition your Delta table on the date value. This creates subfolders for each partition, in the root path of the Delta table.
For example, date=2023-01-01
, date=2023-01-02
, etc.
You enable Delta Lake column mapping, but when you try to list the subfolders, the names are not what you expect (date=2023-01-01
) because those date partitions are no longer available.
Instead, you see subfolders, with random names, that do not appear to be a valid date partition. For example, you see partitions like date=xx
.
Cause
When Delta Lake column mapping is enabled on a table, it uses random file prefixes, and removes the ability to explore data using Hive-style partitioning.
Solution
This is expected behavior when column mapping is enabled. You can still query your data. In this example, if you want to retrieve data from a single date, you should apply a filter in your where
clause.
For more information, please review the Do Delta Lake and Parquet share partitioning strategies? section in the When to partition tables on Databricks documentation (AWS | Azure | GCP).