Problem
When running string aggregation queries on a serverless compute, you receive the following error message.
[DATATYPE_MISMATCH.BINARY_OP_WRONG_TYPE] `Cannot resolve '(col1 + col2)' due to data type mismatch: the binary operator requires the input type ('NUMERIC' or 'INTERVAL DAY TO SECOND' or 'INTERVAL YEAR TO MONTH' or 'INTERVAL'), not 'STRING'.`
You notice the same queries work in an all-purpose compute.
Cause
The ANSI_MODE setting behaves differently on a serverless compute than on an all-purpose compute. On a serverless compute, ANSI_MODE is enabled by default, enforcing stricter data type rules and preventing implicit cross-casting.
On an all-purpose compute, ANSI_MODE is disabled by default, allowing for implicit cross-casting of data types during query execution.
Solution
There are two options available to resolve the issue. Consider which best applies to your use case.
Explicitly cast the string columns to the desired numeric data type before performing arithmetic operations. The following code provides an example.
%sql
SELECT SUM(CAST(col1 AS DOUBLE) + CAST(col2 AS DOUBLE)) FROM <table-name>
Alternatively, run the following code (either SQL or Python) in your notebook to disable ANSI_MODE on the serverless compute for the duration of the session.
%sql
set ansi_mode = False
%python
spark.conf.set("spark.sql.ansi.enabled", False)
Then rerun your query on the table.
%sql
select sum(col1 + col2) from <table-name>
Important
If data overflows when ANSI_MODE is disabled, the job may fail with an overflow error.
For more information, refer to the ANSI_MODE (AWS | Azure | GCP) documentation.