Casting string to date/timestamp in DLT pipeline does not throw an error

Configure the Delta Live Tables pipeline to enforce ANSI SQL compliance.

Written by anudeep.konaboina

Last published at: September 23rd, 2024

Problem

When working with Delta Live Tables, you should encounter an issue when a string value is cast to a date/timestamp datatype, but do not.

Example

A sample DLT pipeline definition:

@dlt.table(
    name='test',
    temporary=True
)
def df_src():
    return (
        spark.sql("select 101 as id, cast('test' as Date) as dt from <source-table>")
    )

Cause

The Delta Live Tables pipeline inserts NULLs in the target table for each string cast to a date/timestamp datatype instead of failing at the analysis phase.

Note 

This occurs in Delta Live Tables in both CURRENT and PREVIEW channels but not on Databricks Runtime.

 

Solution

Configure the Delta Live Tables pipeline to enforce ANSI SQL compliance by setting the spark.sql.ansi.enabled parameter to true.

spark.conf.set("spark.sql.ansi.enabled", "true")

As an alternative, you can add this configuration in your pipeline settings by clicking Add Configuration under the Advanced section.

The pipeline now fails as it should.

org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] 

 

The value test of the type STRING cannot be cast to DATE because it is malformed. Correct the value as per the syntax, or change its target type. Use try_cast to tolerate malformed input and return NULL instead. If necessary, set spark.sql.ansi.enabled to false to bypass this error.

 

select 101 as id, cast('test' as Date) as dt from <source-table>