Unable to use dynamic variable passing to create a function in Databricks SQL Warehouse

Use a Python UDF in a notebook to dynamically pass the table name as a variable, then access the function in a notebook or DBSQL.

Written by shanmugavel.chandrakasu

Last published at: September 23rd, 2024

Problem 

You want to pass a dynamic variable to a Python user-defined function (UDF) and get a list of the table versioning that is required for operations CREATE, MERGE, INSERT, DELETE, UPSERT, UPDATE, WRITE, REPLACE, and CLONE within the DBSQL warehouse, but you are unable to create a function unless you use a notebook and restrict running your SQL queries to the SQL editor. 

Cause

Databricks SQL Warehouse does not allow dynamic variable passing within SQL to create functions. (This is distinct from executing queries by dynamically passing variables.)  

Solution

Use a Python UDF in a notebook to dynamically pass the table name as a variable, then access the function in a notebook or DBSQL.

Example

First retrieve the version number of the most recent change (excluding the latest change) in the history where certain operations (like CREATE, MERGE, INSERT, DELETE, etc.) were performed. 

 

Given the Delta table, the retrieval returns the version along with the table name, e.x. table_name@v2.

 

def sayHello(*args):
    query = """
        CREATE OR REPLACE FUNCTION <catalog-name>.<schema-name>.<function-name> (table-name STRING)
        RETURNS STRING
        READS SQL DATA
        RETURN (
          WITH hist AS (
            SELECT *
            FROM (DESCRIBE HISTORY {0})
          ),
          last_version AS (
            SELECT version
            FROM hist
            WHERE operation IN ('CREATE', 'MERGE', 'INSERT', 'DELETE', 'UPSERT', 'UPDATE', 'WRITE', 'REPLACE', 'CLONE')
            ORDER BY version DESC
            LIMIT 1
          )
          SELECT CONCAT('{0}', '@v', CAST(a.version AS STRING))
          FROM hist a, last_version b 
          WHERE a.version < b.version
          ORDER BY a.version DESC
          LIMIT 1
        )
        """
    df = spark.sql(query.format(*args))
    print(df.show(truncate=False))
    return

 

Then call the function using the DBSQL. 

 

select <catalog-name>.<schema-name>.<function-name> (<table-name>)