Problem
In Databricks Runtime 13.3 LTS to 15.3, when using sortWithinPartitions
to make sure the rows in each partition are ordered based on the columns, the sorted data frame looks correct when displayed, but after saving and reading it back, the sorting is lost.
Cause
There is an issue in which the planned write local sort comes after the sortWithinPartitions
local sort, and then EliminateSorts
drops the first sort as unnecessary. This behavior occurs with or without Photon.
Solution
This issue is fixed in Databricks Runtime 15.4 LTS.
If upgrading is not an option, set the below Apache Spark configuration as a workaround.
spark.conf.set("spark.sql.optimizer.plannedWrite.enabled", "false")