Problem
When performing MERGE or upsert operations, you observe that large numeric values stored in DOUBLE type columns are implicitly converted to scientific (exponential) notation when merged into target columns of type STRING. This conversion results in data inconsistencies in the target tables, especially when original precision is critical (for example, the value 731220381673091 appears as 7.31220381673091E14 in the resulting table).
Cause
Casting a DOUBLE to a STRING causes the JVM to invoke the Double.toString() method. This method automatically converts large values to scientific notation for readability and efficiency, unless explicitly formatted otherwise.
Solution
Explicitly cast or format the DOUBLE value to STRING before executing any MERGE or upsert operations. You can either use format_number() or the DECIMAL type.
Use format_number()
The format_number() function rounds values and limits decimal precision to the specified number of decimal places in the second parameter (in this example, 0). Run the following query in a notebook or in the SQL editor to format the DOUBLE value as a STRING.
SELECT format_number(<column-name>, 0) AS <column-name> FROM <source-table>
Note
Use this option only if you do not need to preserve fractional precision.
Use DECIMAL Type
Define the source and target columns as DECIMAL instead of DOUBLE or STRING. Decimal types retain full precision.
Run the following SQL command either in a notebook or in the Databricks SQL editor.
ALTER TABLE <target-table> CHANGE COLUMN <column-name> DECIMAL(20, 0);
Using DECIMAL type ensures that large numeric values are stored and displayed in their full form without loss of precision.