Purpose
This document facilitates enabling ANSI for your existing Databricks workloads. It enumerates behavior changes and explains how to detect and mitigate unintended effects.
Enabling ANSI mode has the following benefits:
- Aligns SQL on Databricks with the ANSI SQL Standard, making your SQL code and SQL skills more transferable
 - Acts as a safety net for your application, shielding it from unexpected and sometimes incorrect results.
 
When you enable ANSI mode, most computations run the same. Some unsafe operations will error out, but you can use the descriptive error messages and this guide to update your code.
ANSI_MODE
ANSI mode strengthens and clarifies the behavior for operators, functions, and explicit and implicit type conversions.
You can enable ANSI mode by setting spark.sql.ansi.enabled to true.
spark.conf.set("spark.sql.ansi.enabled", "true")
Enabling ANSI mode causes the following behavior changes:
Domain overflows
An arithmetic overflow occurs when an operation produces a result that exceeds the storage capacity of the data type used to hold the result.
Behavior with ANSI mode enabled
An error is thrown and the SQL statement does not run.
Behavior without ANSI mode enabled
Results in one of the following outcomes:
- Incorrect values and results caused by domain wrapping. Because the value exceeds the maximum limit of its data type, it wraps around to the lowest possible value. This is common in some programming languages but can produce incorrect results in data processing systems. 
- Example: For an 8-bit signed integer (
TINYINT), the range is from -128 to 127:- If you add 1 to 127, instead of generating an error, the value wraps around to -128.
 - Similarly, subtracting 1 from -128 would wrap around to 127.
 
 
 - Example: For an 8-bit signed integer (
 - Returns 
NULL. This behavior can result in undercounting values, especially when aggregated.- Example: The largest or smallest value cannot be more than 38 digits for 
DECIMALarithmetic. If you add 1 to 99,999,999,999,999,999,999,999,999,999,999,999,999.0 the result isNULL 
 - Example: The largest or smallest value cannot be more than 38 digits for 
 
For each affected operator and many affected functions, Databricks provides TRY functions that return NULL instead of errors for cases where you prefer incomplete data to a failing query. TRY functions never return wrapped values. 
Division by zero
Zero has been passed into a division operation as the divisor.
Behavior with ANSI mode enabled
Databricks returns an error when encountering a division-by-zero during an arithmetic operation. Note that division-by-zero can occur in other functions besides the division operator, such as the mod function.
Behavior without ANSI mode enabled
Division-by-zero returns NULL. To preserve this behavior in select cases, use the respective TRY function. For example, the try_divide function might work well for this case.
Out of bound array index
Behavior with ANSI mode enabled
The indexing into an array operation returns an error if the index is negative or beyond the last element in the array.
Behavior without ANSI mode enabled
Both situations return NULL. To return NULL in ANSI mode, you can use get function instead of brackets ([ ]).
Explicit casting
Behavior with ANSI mode enabled
The casting operation returns an error if attempted between incompatible types. Additionally, out-of-domain data is no longer implicitly tolerated, meaning that operations with such data will result in errors rather than being adjusted silently.
Protects against:
- 
Invalid Input: Passing a non-numeric string to a cast from 
STRINGto a numeric fails. - 
Overflow: Passing a large integer string to a cast from 
STRINGtoBYTE. - 
Numeric truncation: Passing a non-integral numeric string to a cast from 
STRINGtoINTEGER. 
To return NULL instead of an error for invalid input, overflow, or numeric truncation, you can use the try_cast expression.
Behavior without ANSI mode enabled
Cast operations between BOOLEAN and DATE or TIMESTAMP are permitted. Casting from integral numerics (BYTE, SMALLINT, INT, BIGINT) to BINARY is also allowed.
Considerations:
- The 
BOOLEANconversions are considered user errors, given that no conceivable semantics exist. Blocking them improves productivity. - Integer to binary conversion is semantically dangerous. The result depends on the hardware representation of those base types.
 
Implicit type conversion
Implicit type conversion is aligning different datatypes to operate or deriving a single result type when combining types.
Behavior with ANSI mode enabled
In ANSI mode, rules are more strictly enforced, protecting you from unexpected behavior.
- When using a 
STRINGtype with any arithmetic expression, the string will only be cast toDOUBLEif the numeric type is aDECIMAL,FLOAT, orDOUBLE. Otherwise, it will beBIGINT. - The previous rule applies to comparison operations. Overflow errors are raised where appropriate while defaulting to the type with the largest domain (
BIGINTorDOUBLE). - When deriving the least common type amongst a set of types, which includes 
STRINGs, theSTRINGs will be cast to the most general type of the other type family instead of casting all other operands toSTRING. This protects subsequent sort orders and comparison operations. 
Behavior without ANSI mode enabled
- 
STRINGs are cast to the exact numeric type of the other operand. This can result in silent overflows, truncation, and, therefore, incorrect results. 
Detection and mitigation
In most cases, switching from non-ANSI to ANSI mode will result in few, if any, observable differences. In this section, we discuss what to do when encountering differences based on the following observations.
If you do encounter any of the below error conditions, we strongly encourage you to carefully review the relevant operations and input data to determine if a more systematic fix is needed to ensure that downstream operations are acting on correct data. For example, if you encounter a DIVIDE_BY_ZERO error, it may mean you would be better off filtering out zeros from your input data rather than just doing the mechanical try_divide function fix.
Error conditions
Most frequently, you will see error conditions where there were none before. The following list identifies the error type and message, explains the error, and provides suggestions to resolve the error where possible.
DIVIDE_BY_ZERO
[DIVIDE_BY_ZERO] Division by zero. Use try_divide to tolerate the divisor being 0 and return NULL instead. SQLSTATE: 22012
Problem: The data contained a 0, fed as the divisor to a division operation. The error message displays the operation in question. That might not be a top-level division. A modulo operation, for example, can result in this error.
Suggested resolutions:
- To fix the error mechanically, replace the operator or function with the proposed 
try_function. For example, thetry_dividefunction. - Investigate whether the logic or the data has a flaw and 0s need to be cleaned or specially processed.
 
CANNOT_PARSE_TIMESTAMP
[CANNOT_PARSE_TIMESTAMP] <message> SQLSTATE: 22007
Problem: The string value provided to the to_date, to_timestamp, unix_timestamp, or to_unix_timestamp is not a valid datetime value or does not align with the provided format string.
Examples:
- Parsing a February 29 for a non-leap year in to_timestamp
 - Parsing 
2020-01-27T20:06:11.847using theyyyy-MM-dd HH:mm:ss.SSSformat. Note the extraTwhich is not part of the expected format. 
Suggested resolution:
- Clean the input value or adjust the format string
 - If you decide the function should silently fail by producing a 
NULL, replace it with its matching try function. For example, replaceto_timestampwithtry_to_timestamp. 
CAST_OVERFLOW
[CAST_OVERFLOW] The value <value> of the type <sourceType> cannot be cast to <targetType> due to an overflow. Use try_cast to tolerate overflow and return NULL instead. SQLSTATE: 22003
Problem: The value that needs to be cast is outside of the domain supported by the target type.
Examples:
- Casing the 
BIGINTvalue 2147483648 toINTresults in an overflow. 
Suggested resolution:
- If you believe the value should be handled, inject an explicit cast to an appropriate type. For example: 
CAST(... AS BIGINT). - If you decide this cast should silently fail by producing a 
NULL, inject atry_cast(.. AS <targetType>). 
ARITHMETIC_OVERFLOW
[ARITHMETIC_OVERFLOW] <message>.<alternative> SQLSTATE: 22003
Problem: The source data needs to be cleaned, or you must widen the type domain.
Example:
- Adding one to max integer: 2147483647 + 1
 
Suggested resolutions:
- Cast at least one of the operands to a sufficiently large type. For example, 
cast(2147483647 AS BIGINT) + 1. - Use the operator's 
try_version to tolerate the overflow as aNULL.
Other examples include the*,+, and-operator, which can be replaced withtry_multiply,try_add, andtry_subtract, respectively. 
INVALID_FRACTION_OF_SECOND
[INVALID_FRACTION_OF_SECOND] The fraction of sec must be zero. Valid range is [0, 60]. SQLSTATE: 22023
Problem: The second value is greater than 60 (and less than 61) The make_timestamp function returns this error.
Suggested resolution: Use the following example to revert to non-ANSI mode behavior.
make_timestamp(<year>, <month>, <day>, <hour>, <minute>, <second>, <timezone>) ==
try_cast(<year> || ‘-’ || <month> || ‘-’ || <day> || ‘ ‘ || 
         <hour> || ‘:’ || <minute> || ‘:’ || <second> || ‘ ’ || <timezone>
         As timestamp)NUMERIC_VALUE_OUT_OF_RANGE
[NUMERIC_VALUE_OUT_OF_RANGE] <value> cannot be represented as Decimal(<precision>, <scale>). SQLSTATE: 22003
Problem: The numeric value cannot be represented as a DECIMAL of the given precision and scale.
Example: Casting the value 170 to a DECIMAL(2, 0) fails because 170 has three digits.
Suggested resolution:
- Increase the precision of the target type to ensure the value is within the supported range.
 - If you decide this cast should silently fail by producing a 
NULL, inject atry_cast(.. AS DECIMAL(<precision>, <scale>). 
CAST_INVALID_INPUT
[CAST_INVALID_INPUT] The value <value> of the type <sourceType> cannot be cast to <targetType> because it is malformed. Correct the value as per the syntax, or change its target type. Use try_cast to tolerate malformed input and return NULL instead. SQLSTATE: 22018
Problem: Although the source type can generally be cast to the target type, this specific value cannot be safely cast. The source type is often a STRING.
Examples:
- 
Truncation: Casting the 
STRING 10.1toBIGINTwould result in a loss of 0.1. - 
Overflow: Casting the 
STRING 128toBYTEresults in an overflow. - 
Invalid characters: Casting the 
STRING 128-toBIGINTis invalid. It is unclear what the meaning of a trailing dash is. - 
Invalid strings for dates or timestamps: Casting 
2015-04-31to aDATEorTIMESTAMPfails because April only has 30 days. 
Suggested resolutions:
- If the value is truncated, do one of the following:
- If you believe the value should be handled, inject an explicit cast to an appropriate type. For example: 
CAST(... AS DOUBLE). - If you conclude that the value should be truncated, inject an explicit cast to a type handling the domain and then to the desired final type.
For example:CAST(CAST(... AS DOUBLE) AS BIGINT).
The second deliberate cast fromDOUBLEtoBIGINTwill direct Databricks to truncate. - If you decide this cast should silently fail by producing a 
NULL, inject atry_cast(.. AS <targetType>). - You may conclude that the data or SQL is wrong, so you need to clean the data or correct the SQL expression.
 
 - If you believe the value should be handled, inject an explicit cast to an appropriate type. For example: 
 - If casting the value results in an overflow, do one of the following:
- If you believe the value should be handled, inject an explicit cast to an appropriate type. For example: 
CAST(... AS DOUBLE). - If you decide this cast should silently fail by producing a 
NULL, inject atry_cast(.. AS <targetType>). - You may conclude that the data or SQL is wrong, so you need to clean the data or correct the SQL expression.
 
 - If you believe the value should be handled, inject an explicit cast to an appropriate type. For example: 
 - If the value contains invalid characters:
- This error is most likely caused by 
STRINGnot being the least common type anymore due to aUNION,CASE,COALESCE, orIN. These operations require that all values involved be cast to a common data type. - To fix it, explicitly convert the non-string values to 
STRING. For example,COALESCE(‘128-’, CAST(1 AS STRING)). 
 - This error is most likely caused by 
 - If the value contains invalid strings for dates or timestamps, do one of the following:
- If you decide this cast should silently fail by producing a 
NULL, inject atry_cast(.. AS <targetType>). - If you conclude that the data or SQL is wrong, clean the data or correct the SQL expression.
 
 - If you decide this cast should silently fail by producing a 
 
BINARY_ARITHMETIC_OVERFLOW
[BINARY_ARITHMETIC_OVERFLOW] <value1> <symbol> <value2> caused overflow. SQLSTATE: 22003
Problem: The source data needs to be cleaned, or you must widen the type domain.
Suggested resolutions:
- Cast at least one of the operands to a sufficiently large type. For example, 
sum(cast(<intArg> AS BIGINT). - Use the operator's 
try_version to tolerate the overflow as aNULL.
For example, an overflowingsumfunction can be replaced with atry_sumfunction.
Other examples include the*,+, and-operator, which can be replaced withtry_multiply,try_add, andtry_subtract, respectively. 
Info
Databricks does not provide a means to preserve the overflow behavior.
INVALID_ARRAY_INDEX
[INVALID_ARRAY_INDEX] The index <idx> is out of bounds. The array has <numElem> elements. Use the SQL function get() to tolerate accessing elements at invalid index and return NULL instead. SQLSTATE: 22003
Problem: Before enabling ANSI mode, statements that index into an array return NULL when indices are too large or negative. 
Suggested resolutions:
- Use 
get(<arrayExpr>, <idx>)to replicate the original behavior. - Especially when the index is negative, you should investigate the cause, as it may indicate a logic error in your SQL.
 
Info
The elt function can also return this error. You can rewrite elt(<idx>, <expressions>) to try_element_at(array(<expressions>), <idx>) to tolerate out of bounds indices.
Note that get and [] are 0 based, while elt, element_at and try_elemet_at are 1-based.
INVALID_ARRAY_INDEX_IN_ELEMENT_AT
[INVALID_ARRAY_INDEX_IN_ELEMENT_AT] The index <idx> is out of bounds. The array has  <numElem> elements. Use try_element_at to tolerate accessing element at invalid index and return NULL instead. SQLSTATE: 22003
Problem: Before enabling ANSI mode, statements that index into an array return NULL when indices are too large or negative. 
Suggested resolutions:
- Use 
try_element_at(<arrayExpr>, <idx>)to replicate the original behavior. - Especially when the index is negative, you should investigate the cause, as it may indicate a logic error in your SQL.
 
DATATYPE_MISMATCH
[DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION] Cannot resolve "<expr>" due to data type mismatch: cannot cast "<sourceType>" to "<targetType>" with ANSI mode on. SQLSTATE: 42K09
Problem: A data type mismatch is causing the casting operation to fail.
Suggested resolutions:
- If casting from an integral numeric, use the 
unhex(hex(<intExpr>))expression to convert the integral numeric to a hexadecimal string and then into aBINARY. - If you encounter this condition when casting between datetime types and 
BOOLEAN, investigate the reason and fix the logic, as there is no reasonable semantic on what the result would be. 
Invalid value for MonthOfYear (valid values 1 - 12)
Problem: The month value is out of range. The make_date and make_timestamp functions return this error. 
Suggested resolution: Use the following example to revert to non-ANSI mode behavior.
make_date(<year>, <month>, <day>)
==
try_cast(<year> || ‘-’ || <month> || ‘-’ || <day> as date)
make_timestamp(<year>, <month>, <day>, <hour>, <minute>, <second>, <timezone>) ==
try_cast(<year> || ‘-’ || <month> || ‘-’ || <day> || ‘ ‘ || 
         <hour> || ‘:’ || <minute> || ‘:’ || <second> || ‘ ’ || <timezone>
         As timestamp)Invalid value for DayOfMonth (valid values 1 - 28/31)
Problem: The day value is out of range. The make_date and make_timestamp functions return this error. 
Suggested resolution: Use the following example to revert to non-ANSI mode behavior.
make_date(<year>, <month>, <day>)
==
try_cast(<year> || ‘-’ || <month> || ‘-’ || <day> as date)
make_timestamp(<year>, <month>, <day>, <hour>, <minute>, <second>, <timezone>) ==
try_cast(<year> || ‘-’ || <month> || ‘-’ || <day> || ‘ ‘ || 
         <hour> || ‘:’ || <minute> || ‘:’ || <second> || ‘ ’ || <timezone>
         As timestamp)Invalid value for HourOfDay (valid values 0 - 23)
Problem: The hour value is out of range. The make_date and make_timestamp functions return this error. 
Suggested resolution: Use the following example to revert to non-ANSI mode behavior.
make_timestamp(<year>, <month>, <day>, <hour>, <minute>, <second>, <timezone>) ==
try_cast(<year> || ‘-’ || <month> || ‘-’ || <day> || ‘ ‘ || 
         <hour> || ‘:’ || <minute> || ‘:’ || <second> || ‘ ’ || <timezone>
         As timestamp)Invalid value for MinuteOfHour (valid values 0 - 59)
Problem: The month value is out of range. The make_date and make_timestamp functions return this error. 
Suggested resolution: Use the following example to revert to non-ANSI mode behavior.
make_timestamp(<year>, <month>, <day>, <hour>, <minute>, <second>, <timezone>) ==
try_cast(<year> || ‘-’ || <month> || ‘-’ || <day> || ‘ ‘ || 
         <hour> || ‘:’ || <minute> || ‘:’ || <second> || ‘ ’ || <timezone>
         As timestamp)Invalid value for SecondOfMinute (valid values 0 - 59)
Problem: The second value is out of range. The make_date and make_timestamp functions return this error. 
Suggested resolution: Use the following example to revert to non-ANSI mode behavior.
make_timestamp(<year>, <month>, <day>, <hour>, <minute>, <second>, <timezone>) ==
try_cast(<year> || ‘-’ || <month> || ‘-’ || <day> || ‘ ‘ || 
         <hour> || ‘:’ || <minute> || ‘:’ || <second> || ‘ ’ || <timezone>
         As timestamp)Invalid date 'February 29' as '<year>' is not a leap year
Problem: The day value is out of range. The make_date and make_timestamp functions return this error. 
Suggested resolution: Use the following example to revert to non-ANSI mode behavior.
make_date(<year>, <month>, <day>)
==
try_cast(<year> || ‘-’ || <month> || ‘-’ || <day> as date)
make_timestamp(<year>, <month>, <day>, <hour>, <minute>, <second>, <timezone>) ==
try_cast(<year> || ‘-’ || <month> || ‘-’ || <day> || ‘ ‘ || 
         <hour> || ‘:’ || <minute> || ‘:’ || <second> || ‘ ’ || <timezone>
         As timestamp)ILLEGAL_DAY_OF_WEEK
[ILLEGAL_DAY_OF_WEEK] Illegal input for day of week: <dayName>. SQLSTATE: 22009
Problem: This error is returned by the next_day function when the requested day is malformed.
Suggested resolution: Use the following example to revert to non-ANSI mode behavior.
next_day(<dateExpr>, 
         case when <dayName> in (
         'SU', 'SUN', 'SUNDAY',
         'MO', 'MON', 'MONDAY',
         'TU', 'TUE', 'TUESDAY',
         'WE', 'WED', 'WEDNESDAY',
         'TH', 'THU', 'THURSDAY',
         'FR', 'FRI', 'FRIDAY',
         'SA', 'SAT', 'SATURDAY') THEN <dayname> end)INVALID_URL
[INVALID_URL] The url is invalid: <url>. SQLSTATE: 22P02
Problem: This error is the result of an invalid URL string passed to parse_url.
Suggested resolution: To remedy this error, clean the input data.
Behavior changes not returning obvious errors
The following section discusses some functions that change behavior in more subtle ways.
Size() function
The size function returns the cardinality of an array or map.
Behavior with ANSI mode enabled
The size of a NULL array or NULL map is NULL.
Behavior without ANSI mode enabled
A NULL array or NULL map size is -1.
This function is rare. Search for it in your assets and investigate its usage.
Nested errors
Runtime errors can occur within the body of a view or a SQL UDF that is called from a top-level query. In this case, the error message context will include the fully qualified name of the view or UDF and which kind of object it is.
Retrieve the object's definition using one of the following:
- 
SHOW CREATE TABLEto retrieve the view definition. - 
DESCRIBE FUNCTION EXTENDEDto find the body of a SQL UDF. 
Then, correct the SQL statement using one of the following commands:
- 
CREATE OR REPLACE VIEWto redefine the view. - 
CREATE OR REPLACE FUNCTIONto redefine the function. 
Remember that replacing views and functions does NOT preserve any GRANTs issued on the object.
You must collect existing privileges using SHOW GRANTS and re-issue them with GRANT.