Problem
You have a job that is using Apache Spark to read from a Snowflake table, but the time data that appears in the Dataframe is incorrect.
If you run the same query directly on Snowflake, the correct time data is returned.
Cause
The time zone value was not correctly set. A mismatch between the time zone value of the Databricks cluster and Snowflake can result in incorrect time values, as explained in Snowflake’s working with timestamps and time zones documentation.
Solution
Set the time zone in Databricks and do not explicitly set a time zone in Snowflake.
Option 1: Set the time zone for SQL statements in Databricks
- Open the Databricks workspace.
- Select Clusters.
- Select the cluster you want to modify.
- Select Edit.
- Select Advanced Options.
- Enter spark.sql.session.timeZone <timezone> in the Spark config field.
- Select Confirm.
Option 2: Set the time zone for all nodes with an init script
- Create the init script with the following command:
%python dbutils.fs.put("/databricks/scripts/set_timezone.sh",""" #!/bin/bash timedatectl set-timezone America/Los_Angeles """, True)
- Verify the full path of the init script.
%python %fs ls /databricks/scripts/set_timezone.sh
- Open the Databricks workspace.
- Select Clusters.
- Select the cluster you want to modify.
- Select Edit.
- Select Advanced Options.
- Select Init Scripts.
- Enter the Init Script Path.
- Select Add.
- Select Confirm.