Problem
When trying to read data from a source directory containing multiple parquet files, you encounter an issue.
s3://<file_path>/test_file.PARQUET. Schema conversion error: cannot convert Parquet type INT32 to Photon type string(0)
Cause
There is a schema mismatch between two parquet files in the same source directory.
When Databricks attempts to read the files and unify their schemas, it encounters a type mismatch, which leads to the error.
Solution
Fix the file schema. Identify the columns with schema discrepancies and modify them to have a consistent data type across all files.
If modifying the files is not an option, you can read the files separately and then union them. This approach allows you to handle schema differences.
Note
This solution will not work for data type differences like timestamp and int. In that case you should correct the file or put the data in two separate tables.