SQL transformations involving timestamp columns giving different results in an interactive cluster versus serverless compute

Use SQL’s type casting to handle precision or upgrade your JDK version.

Last published at: December 26th, 2024

Problem

When performing SQL transformations involving timestamp columns, you get unexpectedly different results in an interactive cluster versus serverless compute.

Cause

Your interactive cluster is running on a version of Databricks Runtime preconfigured to use Java Development Kit (JDK) 8. Serverless compute runs on Databricks Runtime 16.0, which is preconfigured with JDK 17.

Java Development Kit (JDK) changed the precision of timestamp values starting in JDK 9, refining it to microseconds from milliseconds.

Context

OpenJDK introduced an enhancement, Increase the precision of the implementation of java.time.Clock.systemUTC() starting in Java 9. This enhancement allows for capturing time to the microsecond (yyyy-MM-dd HH:mm:ss.SSSSSS). In Java 8, capturing the current moment is still limited to only milliseconds (yyyy-MM-dd HH:mm:ss.SSS).

Solution

Use SQL’s type casting to handle precision, or upgrade your JDK version.

Use SQL’s type casting

Use the date_format("timestamp", "yyyy-MM-dd HH:mm:ss.SSS") function to keep the precision level to milliseconds, allowing the new data to match the preexisting data.

Examples

This code snippet…

changes to…

SELECT * FROM Target tgt INNER JOIN Source src

ON

tgt.etl_time = src.etl_time

SELECT * FROM Target tgt INNER JOIN Source src

ON

date_format(tgt.etl_time, "yyyy-MM-dd HH:mm:ss.SSS") = date_format(src.etl_time, "yyyy-MM-dd HH:mm:ss.SSS")

SELECT max(etl_time) FROM Target

SELECT max(date_format(etl_time, "yyyy-MM-dd HH:mm:ss.SSS")) FROM Target

Upgrade your JDK version

Refer to the Databricks SDK for Java (AWS | Azure | GCP) for instructions on creating a cluster with JDK 17.

Note

For Databricks Runtime versions 13.1 to 15.4, JDK 8 is the default, and JDK 17 is in Public Preview.

Databricks Help Center