Problem
When performing MERGE
or upsert operations, you observe that large numeric values stored in DOUBLE
type columns are implicitly converted to scientific (exponential) notation when merged into target columns of type STRING
. This conversion results in data inconsistencies in the target tables, especially when original precision is critical (for example, the value 731220381673091 appears as 7.31220381673091E14 in the resulting table).
Cause
Casting a DOUBLE
to a STRING
causes the JVM to invoke the Double.toString()
method. This method automatically converts large values to scientific notation for readability and efficiency, unless explicitly formatted otherwise.
Solution
Explicitly cast or format the DOUBLE
value to STRING
before executing any MERGE
or upsert operations. You can either use format_number()
or the DECIMAL
type.
Use format_number()
The format_number()
function rounds values and limits decimal precision to the specified number of decimal places in the second parameter (in this example, 0
). Run the following query in a notebook or in the SQL editor to format the DOUBLE
value as a STRING
.
SELECT format_number(<column-name>, 0) AS <column-name> FROM <source-table>
Note
Use this option only if you do not need to preserve fractional precision.
Use DECIMAL Type
Define the source and target columns as DECIMAL
instead of DOUBLE
or STRING
. Decimal types retain full precision.
Run the following SQL command either in a notebook or in the Databricks SQL editor.
ALTER TABLE <target-table> CHANGE COLUMN <column-name> DECIMAL(20, 0);
Using DECIMAL
type ensures that large numeric values are stored and displayed in their full form without loss of precision.