LEFT JOIN resulting in null values when joining timestamp column and date column

Cast the value of the timestamp column to date datatype when joining it with a column of 'date' datatype.

Written by ram.sankarasubramanian

Last published at: September 12th, 2024

Problem 

When joining two dataframes, joining a timestamp column with a date column results in null values.

Example 

In this example, start_timestamp is of timestamp data type, and start_date is of date data type.

select * from table1
left join table2
on
table1.start_timestamp = table2.start_date

Cause

A join between a timestamp and a date column will produce non-null results only if the time in the timestamp column is 00:00:00 UTC. 

Additionally, if spark.sql.session.timeZone is set to a timezone other than UTC, 00:00:00 UTC is converted to the time as per the set timezone, leading to null results during the join.

Solution

Cast the value of the timestamp column to date datatype when joining it with a column of 'date' datatype.

select * from table1 t1
left join table2 t2
on
to_date(t1.start_timestamp, 'yyyy-MM-dd') = t2.start_date