Time zones converted from a local zone to UTC and back not reverting to original values in Apache Spark and SQL Warehouse

Set spark.sql.datetime.java8API.enabled to true on the cluster.

Written by allan.soares

Last published at: January 30th, 2025

Problem

When using Apache Spark and SQL Warehouse, you encounter time zone conversion discrepancies. For example, when converting timestamps from the 'Australia/Sydney' time zone to UTC and then back to 'Australia/Sydney', the resulting timestamps may not match the original values. 

 

Cause

The configuration parameter spark.sql.datetime.java8API.enabled is enabled by default in SQL Warehouse but not in interactive clusters. When not enabled, this parameter affects how timestamps are handled and converted, leading to inconsistencies.

Additionally, historical changes in time zone offsets, such as the shift from UTC+10:05 to UTC+10:00 in 1896, contribute to the observed discrepancies.

 

Solution

  1. Navigate to your cluster settings. 
  2. Under the Advanced options > Spark tab, enter spark.sql.datetime.java8API.enabled True in the Spark config box. 
  3. Test the timestamp conversion again to ensure that the issue is resolved.

Alternatively, you can set the parameter in a notebook. Run the following code.  

 

spark.conf.set("spark.sql.datetime.java8API.enabled", "true")

 

If the issue persists, verify that the configuration change has been applied correctly and that there are no other conflicting settings.

For further reference, consult the Dates and timestamps (AWSAzureGCP) documentation.