Problem
When trying to read datetime data files, you encounter an error.
java.lang.UnsupportedOperationException with the message: "LEGACY datetime rebase mode is only supported for files written in UTC timezone. Actual file timezone: Asia/Kolkata." This error occurs when attempting to read data files that were written in a timezone other than UTC while using the LEGACY datetime rebase mode.
Cause
The data files are written in a time zone other than UTC. The LEGACY
datetime rebase mode in Apache Spark is designed to handle datetime values based on the UTC timezone. When files are written in a different timezone, such as Asia/Kolkata, the rebase mode cannot correctly interpret the datetime values, leading to the UnsupportedOperationException
.
Solution
Configure your Spark cluster’s datetime rebase mode.
The spark.sql.legacy.parquet.datetimeRebaseModeInRead
configuration allows Spark to read the datetime values in the legacy rebase mode, even if the files were written in a timezone other than UTC.
- Navigate to the cluster configuration page in your Databricks workspace.
- Click the Advanced Options toggle.
- Add the following configuration in the Spark configuration tab.
spark.sql.legacy.parquet.datetimeRebaseModeInRead LEGACY
For more information, review the Spark Parquet Files documentation.