Problem
When working with Parquet files in Delta Lake on a Photon-enabled cluster, you notice your jobs fail with the following error.
Schema conversion error: cannot convert Parquet type INT32 to Photon type long
Cause
The Photon engine uses a different set of data types than the traditional Apache Spark execution engine. The INT32 type used in the Parquet files is not directly convertible to Photon's native long data type. This mismatch causes a schema conversion error.
Solution
Disable the Photon's reader by setting the configuration property spark.databricks.photon.scan.enabled
to false
.
This configuration change bypasses the Photon engine's reader, which is the component responsible for the schema conversion error. The Spark engine reverts to its traditional reader, which is able to handle the INT32 data type without issues.
Additionally, ensure you’re using a Databricks Runtime that supports mixed types in Parquet files, such as Databricks Runtime 14.3 LTS or above.
Important
Disabling Photon on a cluster may decrease the efficiency of some queries, resulting in slower executions. However, disabling allows the job to run successfully in this case.