Problem
When performing SQL transformations involving timestamp columns, you get unexpectedly different results in an interactive cluster versus serverless compute.
Cause
Your interactive cluster is running on a version of Databricks Runtime preconfigured to use Java Development Kit (JDK) 8. Serverless compute runs on Databricks Runtime 16.0, which is preconfigured with JDK 17.
Java Development Kit (JDK) changed the precision of timestamp values starting in JDK 9, refining it to microseconds from milliseconds.
Context
OpenJDK introduced an enhancement, Increase the precision of the implementation of java.time.Clock.systemUTC() starting in Java 9. This enhancement allows for capturing time to the microsecond (yyyy-MM-dd HH:mm:ss.SSSSSS
). In Java 8, capturing the current moment is still limited to only milliseconds (yyyy-MM-dd HH:mm:ss.SSS
).
Solution
Use SQL’s type casting to handle precision, or upgrade your JDK version.
Use SQL’s type casting
Use the date_format("timestamp", "yyyy-MM-dd HH:mm:ss.SSS")
function to keep the precision level to milliseconds, allowing the new data to match the preexisting data.
Examples
This code snippet… |
changes to… |
|
|
|
|
Upgrade your JDK version
Refer to the Databricks SDK for Java (AWS | Azure | GCP) for instructions on creating a cluster with JDK 17.
Note
For Databricks Runtime versions 13.1 to 15.4, JDK 8 is the default, and JDK 17 is in Public Preview.