String aggregation queries failing with data type mismatch error on serverless compute

Use explicit casting or disable ANSI_MODE.

Written by nikhil.jain

Last published at: April 8th, 2025

Problem

When running string aggregation queries on a serverless compute, you receive the following error message. 

[DATATYPE_MISMATCH.BINARY_OP_WRONG_TYPE] `Cannot resolve '(col1 + col2)' due to data type mismatch: the binary operator requires the input type ('NUMERIC' or 'INTERVAL DAY TO SECOND' or 'INTERVAL YEAR TO MONTH' or 'INTERVAL'), not 'STRING'.`

 

You notice the same queries work in an all-purpose compute. 

 

Cause

The ANSI_MODE setting behaves differently on a serverless compute than on an all-purpose compute. On a serverless compute, ANSI_MODE is enabled by default, enforcing stricter data type rules and preventing implicit cross-casting.

 

On an all-purpose compute, ANSI_MODE is disabled by default, allowing for implicit cross-casting of data types during query execution. 

 

Solution

There are two options available to resolve the issue. Consider which best applies to your use case.

 

Explicitly cast the string columns to the desired numeric data type before performing arithmetic operations. The following code provides an example.

%sql 
SELECT SUM(CAST(col1 AS DOUBLE) + CAST(col2 AS DOUBLE)) FROM <table-name>

 

Alternatively, run the following code (either SQL or Python) in your notebook to disable ANSI_MODE on the serverless compute for the duration of the session. 

%sql
set ansi_mode = False

%python
spark.conf.set("spark.sql.ansi.enabled", False)

 

Then rerun your query on the table.

%sql
select sum(col1 + col2) from <table-name>

 

Important

If data overflows when ANSI_MODE is disabled, the job may fail with an overflow error. 

 

 

For more information, refer to the ANSI_MODE (AWSAzureGCP) documentation.