Problem
When you try to migrate a job from classic compute to serverless, the job fails with the following error.
pyspark.errors.exceptions.connect.ArrayIndexOutOfBoundsException: [INVALID_ARRAY_INDEX] The index 0 is out of bounds. The array has 0 elements. Use the SQL function `get()` to tolerate accessing element at invalid index and return NULL instead. SQLSTATE: 22003
Cause
In serverless compute, ANSI SQL strictly enforces valid SQL operations by default, resulting in errors when invalid operations are encountered.
In classic compute, ANSI SQL is disabled by default, allowing invalid operations to proceed without errors. For example, when you use the following code snippet to try to access an element in an empty array, the attempt triggers the ArrayIndexOutOfBoundsException
.
df1 = df.filter(col("<your-keywords>").getItem(0).isNotNull())
Solution
You can either rewrite your code using the get()
function or disable ANSI SQL.
Rewrite the code using the get() function
Replace the code snippet causing the issue with the get()
function, which returns the element at the given index if it exists or returns NULL
if the index is out of bounds. The following code provides an example.
df1 = df.filter(get(col("<your-keywords>"),0).isNotNull())
Disable ANSI SQL
Set the spark.sql.ansi.enabled
property to false when using serverless compute. Setting this property to false disables the stricter ANSI SQL behavior.
You can set the following config using a notebook.
spark.conf.set("spark.sql.ansi.enabled", "false”)