Jobs failing with schema conversion error: cannot convert Parquet type INT32 to Photon type long

Set spark.databricks.photon.scan.enabled to false.

Written by Guilherme Leite

Last published at: January 16th, 2025

Problem

When working with Parquet files in Delta Lake on a Photon-enabled cluster, you notice your jobs fail with the following error. 

 

Schema conversion error: cannot convert Parquet type INT32 to Photon type long

 

Cause

The Photon engine uses a different set of data types than the traditional Apache Spark execution engine. The INT32 type used in the Parquet files is not directly convertible to Photon's native long data type. This mismatch causes a schema conversion error.

 

Solution

Disable the Photon's reader by setting the configuration property spark.databricks.photon.scan.enabled to false

 

This configuration change bypasses the Photon engine's reader, which is the component responsible for the schema conversion error. The Spark engine reverts to its traditional reader, which is able to handle the INT32 data type without issues. 

 

Additionally, ensure you’re using a Databricks Runtime that supports mixed types in Parquet files, such as Databricks Runtime 14.3 LTS or above. 

 

Important

Disabling Photon on a cluster may decrease the efficiency of some queries, resulting in slower executions. However, disabling allows the job to run successfully in this case.