Problem
In Databricks Runtime 13.3 LTS to 15.3, when using sortWithinPartitions
to make sure the rows in each partition are ordered based on the columns, the sorted data frame looks correct when displayed, but after saving and reading it back, the sorting is lost.
Cause
There is a bug in which the planned write local sort comes after the sortWithinPartitions
local sort, and then EliminateSorts
drops the first sort as unnecessary. The bug exists with or without Photon.
This issue is fixed in Databricks Runtime 15.4 LTS.
Solution
Set the below Apache Spark configuration as a workaround.
spark.conf.set("spark.sql.optimizer.plannedWrite.enabled", "false")