Handling WARN Message: 'Could not turn on CDF for table (table-name)' in Delta Live Tables Pipeline

Exclude and then drop reserved columns.

Written by gopinath.chandrasekaran

Last published at: September 23rd, 2024

Problem

While running the Databricks Delta Live Tables (DLT) pipeline, you encounter a WARN message in DLT event logs.

Could not turn on CDF for table <table-name>. The table contains reserved columns  [_change_type, _commit_version, _commit_timestamp] that will be used internally as metadata for the table's Change Data Feed. Change Data Feed is required for certain features in DLT. If you wish to turn it on, please rename/drop the reserved columns.

Cause

By default, DLT creates all tables with Change Data Feed (CDF) enabled. When reading source data with CDF enabled (readChangeFeed=true), the source DataFrame includes reserved columns like [_change_type, _commit_version, _commit_timestamp]

When DLT attempts to create the target table with CDF enabled (with reserved columns), it throws a WARN message due to the ambiguity of these columns. Consequently, DLT falls back and creates the table without CDF enabled. 

This WARN message does not fail the pipeline but indicates that the table creation process with CDF enabled encountered an issue with reserved columns.

Solution

To avoid a WARN message, first use the except_column_list parameter inside dlt.apply_changes() to exclude the reserved columns.  

except_column_list = ["_change_type", "_commit_version", "_commit_timestamp"]

Then, for append-only DLT streaming tables, drop the reserved columns. 

@dlt.table(
    name="<table-name>"
)
def table():
    exclude_columns = ["_change_type", "_commit_version", "_commit_timestamp"]
    df = spark.readStream.format("delta").option("readChangeFeed","true").table("<source-table>")
    return df.drop(*exclude_columns)

 

For more information, please review the The APPLY CHANGES APIs: Simplify change data capture with Delta Live Tables (AWSAzureGCP) documentation.