Delta Live Tables job fails when using collect()

You should not use functions such as collect(), count(), toPandas(), save(), and saveAsTable() within the table and view function definitions.

Written by Jose Gonzalez

Last published at: May 10th, 2023

Problem

You are using collect() in your Delta Live Tables (DLT) pipeline code and you get an error. When you review the stack trace, you see a DataFrame.collect error that says the function is going to be deprecated soon.

"message": "Notebook:/path/to/your/notebook used `DataFrame.collect` function that will be deprecated soon. Please fix the notebook.",
"details": {
    "unsupported_operation": {
        "operation": "COLLECT_TO_DRIVER"
        }
    },
"event_type": "unsupported_operation"

Cause

When using Delta Live Tables, the Python table and view functions must return a DataFrame. You should not use functions such as collect(), count(), toPandas(), save(), and saveAsTable(). These do not return DataFrames and should not be used within the table and view function definitions.

Please review the Delta Live Tables Python limitations (AWS | Azure | GCP) documentation for more information.

Solution

You should not use these functions in your table and views.