Problem
When joining two dataframes, joining a timestamp column with a date column results in null values.
Example
In this example, start_timestamp
is of timestamp data type, and start_date
is of date data type.
select * from table1
left join table2
on
table1.start_timestamp = table2.start_date
Cause
A join between a timestamp and a date column will produce non-null results only if the time in the timestamp column is 00:00:00 UTC.
Additionally, if spark.sql.session.timeZone
is set to a timezone other than UTC, 00:00:00 UTC is converted to the time as per the set timezone, leading to null results during the join.
Solution
Cast the value of the timestamp column to date datatype when joining it with a column of 'date' datatype.
select * from table1 t1
left join table2 t2
on
to_date(t1.start_timestamp, 'yyyy-MM-dd') = t2.start_date