Problem
You want to pass a dynamic variable to a Python user-defined function (UDF) and get a list of the table versioning that is required for operations CREATE
, MERGE
, INSERT
, DELETE
, UPSERT
, UPDATE
, WRITE
, REPLACE
, and CLONE
within the DBSQL warehouse, but you are unable to create a function unless you use a notebook and restrict running your SQL queries to the SQL editor.
Cause
Databricks SQL Warehouse does not allow dynamic variable passing within SQL to create functions. (This is distinct from executing queries by dynamically passing variables.)
Solution
Use a Python UDF in a notebook to dynamically pass the table name as a variable, then access the function in a notebook or DBSQL.
Example
First retrieve the version number of the most recent change (excluding the latest change) in the history where certain operations (like CREATE
, MERGE
, INSERT
, DELETE
, etc.) were performed.
Given the Delta table, the retrieval returns the version along with the table name, e.x. table_name@v2
.
def sayHello(*args):
query = """
CREATE OR REPLACE FUNCTION <catalog-name>.<schema-name>.<function-name> (table-name STRING)
RETURNS STRING
READS SQL DATA
RETURN (
WITH hist AS (
SELECT *
FROM (DESCRIBE HISTORY {0})
),
last_version AS (
SELECT version
FROM hist
WHERE operation IN ('CREATE', 'MERGE', 'INSERT', 'DELETE', 'UPSERT', 'UPDATE', 'WRITE', 'REPLACE', 'CLONE')
ORDER BY version DESC
LIMIT 1
)
SELECT CONCAT('{0}', '@v', CAST(a.version AS STRING))
FROM hist a, last_version b
WHERE a.version < b.version
ORDER BY a.version DESC
LIMIT 1
)
"""
df = spark.sql(query.format(*args))
print(df.show(truncate=False))
return
Then call the function using the DBSQL.
select <catalog-name>.<schema-name>.<function-name> (<table-name>)