Problem
When using JDBC applications with Databricks clusters you see inconsistent java.sql.Timestamp results when switching between standard time and daylight saving time.
Cause
Databricks clusters use UTC by default.
java.sql.Timestamp uses the JVM’s local time zone.
If a Databricks cluster returns 2021-07-12 21:43:08 as a string, the JVM parses it as 2021-07-12 21:43:08 and assumes the time zone is local.
This works normally for most of the year, but when the local time zone has a DST adjustment, it causes an issue as UTC does not change.
For example, on March 14, 2021, the US switched from standard time to daylight saving time. This means that local time went from 1:59 am to 3:00 am.
If a Databricks cluster returns 2021-03-14 02:10:55, the JVM automatically converts it to 2021-03-14 03:10:55 because 02:10:55 does not exist in local time on that date.
Solution
Option 1: Configure the JVM time zone to UTC.
Set the user.timezone property to GMT.
Review the Java time zone settings documentation for more information.
Option 2: Use ODBC instead of JDBC. ODBC interprets timestamps as UTC.
- Install the Databricks ODBC Driver.
- Connect pyodbc (AWS | Azure | GCP) to Databricks.
You can also use turbodbc.
Option 3: Set the local time zone to UTC in your JDBC application.
Review the documentation for your JDBC application to learn how to configure the local time zone settings.