Problem
You're running a job orchestrated by dbt-databricks
, which invokes a Python User-Defined Function (UDF). The job fails consistently whenever executed, particularly in scenarios where multiple UDFs are being created or replaced concurrently.
You receive the following error.
[ROUTINE_NOT_FOUND] The routine `<catalog>`.`<schema>`.`<UC-function>` cannot be found. Verify the spelling and correctness of the schema and catalog. If you did not qualify the name with a schema and catalog, verify the current_schema() output, or qualify the name with the correct schema and catalog. To tolerate the error on drop use DROP ... IF EXISTS. SQLSTATE: 42883
Cause
The CREATE OR REPLACE FUNCTION
statements are concurrently executing.
These operations are not atomic – they involve an implicit drop of the existing function followed by the creation of the new one. Delta Lake does not provide ACID guarantees for concurrent DDL statements of this type.
When multiple CREATE OR REPLACE FUNCTION
commands for the same function are executed simultaneously, one execution drops the function while another is still referencing or re-creating it. This creates a transient state where the function is neither available nor consistently visible to the querying engine, leading to the ROUTINE_NOT_FOUND
error.
This issue commonly arises in parallelized pipelines or orchestrators like dbt, where concurrent runs attempt to redefine the same function.
Solution
Avoid recreating UDFs frequently, implement retry logic, or use explicit DROP
and CREATE
for UDFs.
Avoid recreating UDFs frequently
Instead of redefining functions every run, define the UDFs once during environment setup and reuse them. This approach is both safer and more efficient.
Implement retry logic
If you can’t avoid concurrency, incorporate retry logic with exponential backoff into your UDF calls. This can help mitigate transient errors due to visibility issues in the system catalog. You can use the following example code. The retries are set to 3
and the backoff factor is set to 0.5
.
import time
from databricks import sql
def execute_query_with_retry(query, max_retries=3, backoff_factor=0.5):
retry_count = 0
while retry_count < max_retries:
try:
with sql.connect(server_hostname="<your-host>", http_path="<your-http-path>", access_token="<your-token>") as connection:
with connection.cursor() as cursor:
cursor.execute(query)
return cursor.fetchall()
except sql.exc.Error as e:
if "ROUTINE_NOT_FOUND" in str(e):
retry_count += 1
time.sleep(backoff_factor * (2 ** retry_count))
else:
raise e
raise Exception(f"Failed after {max_retries} retries")
Use explicit DROP and CREATE for UDFs
Instead of relying on CREATE OR REPLACE FUNCTION
, use an explicit DROP FUNCTION IF EXISTS
followed by CREATE FUNCTION
to avoid transient states that can lead to visibility issues during concurrent execution. You can use the following example code. Be sure to provide your own Python logic.
DROP FUNCTION IF EXISTS <catalog>.<schema>.<function-name>;
CREATE FUNCTION <catalog>.<schema>.<function-name>(<input-type>)
RETURNS <output-type>
LANGUAGE PYTHON
DETERMINISTIC
AS $$
# Your Python logic here
def transform(value):
# Example transformation
return some_transformation(value) if value else None
return transform(<input-variable>)
$$;