Problem
When using Apache Spark and SQL Warehouse, you encounter time zone conversion discrepancies. For example, when converting timestamps from the 'Australia/Sydney' time zone to UTC and then back to 'Australia/Sydney', the resulting timestamps may not match the original values.
Cause
The configuration parameter spark.sql.datetime.java8API.enabled
is enabled by default in SQL Warehouse but not in interactive clusters. When not enabled, this parameter affects how timestamps are handled and converted, leading to inconsistencies.
Additionally, historical changes in time zone offsets, such as the shift from UTC+10:05 to UTC+10:00 in 1896, contribute to the observed discrepancies.
Solution
- Navigate to your cluster settings.
- Under the Advanced options > Spark tab, enter
spark.sql.datetime.java8API.enabled True
in the Spark config box. - Test the timestamp conversion again to ensure that the issue is resolved.
Alternatively, you can set the parameter in a notebook. Run the following code.
spark.conf.set("spark.sql.datetime.java8API.enabled", "true")
If the issue persists, verify that the configuration change has been applied correctly and that there are no other conflicting settings.
For further reference, consult the Dates and timestamps (AWS | Azure | GCP) documentation.