Problem
When working with Delta Live Tables, you should encounter an issue when a string value is cast to a date/timestamp datatype, but do not.
Example
A sample DLT pipeline definition:
@dlt.table(
name='test',
temporary=True
)
def df_src():
return (
spark.sql("select 101 as id, cast('test' as Date) as dt from <source-table>")
)
Cause
The Delta Live Tables pipeline inserts NULL
s in the target table for each string cast to a date/timestamp datatype instead of failing at the analysis phase.
Note
This occurs in Delta Live Tables in both CURRENT and PREVIEW channels but not on Databricks Runtime.
Solution
Configure the Delta Live Tables pipeline to enforce ANSI SQL compliance by setting the spark.sql.ansi.enabled
parameter to true
.
spark.conf.set("spark.sql.ansi.enabled", "true")
As an alternative, you can add this configuration in your pipeline settings by clicking Add Configuration under the Advanced section.
The pipeline now fails as it should.
org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT]
The value test
of the type STRING
cannot be cast to DATE
because it is malformed. Correct the value as per the syntax, or change its target type. Use try_cast
to tolerate malformed input and return NULL
instead. If necessary, set spark.sql.ansi.enabled
to false
to bypass this error.
select 101 as id, cast('test' as Date) as dt from <source-table>