Problem
When using a map transformation on a RDD using Databricks Runtime 9.1 LTS and above, the resulting schema order is different when compared to doing the same map transformation using Databricks Runtime 7.3 LTS.
Cause
Databricks Runtime 9.1 LTS and above incorporate Apache Spark 3.x. Starting with Spark 3.0.0, rows created from named arguments do not have field names sorted alphabetically. Instead, they are ordered in as entered.
Solution
To enable Spark 2.x style row sorting set PYSPARK_ROW_FIELD_SORTING_ENABLED to true in your cluster's Spark config (AWS | Azure | GCP).
PYSPARK_ROW_FIELD_SORTING_ENABLED=true
For Python versions less than 3.6, the field names can only be sorted alphabetically.