Incorrect aggregation results when working with high decimal precision numbers in Databricks SQL

Use RDDs instead of SUM.

Written by Vidhi Khaitan

Last published at: October 7th, 2025

Problem

When working with numbers with greater than 38 digits (high-precision numbers) using Databricks SQL, you notice an aggregation error in the result of a SUM operation.

 

Example

A reported aggregated sum is 11111111101111.111111110111111111011112, but you expect the sum 11111111101111.11111111011111111101111111110111111110

The difference arises only in cases where the source system stores values with extremely high precision.

 

Cause

Apache Spark SQL uses the DECIMAL type, which has a maximum precision of 38 digits. This means that you can store the total number of digits (both sides of the decimal point combined) is 38.

 

Solution

Use a Resilient Distributed Dataset (RDD) instead. 

Note

Decimal used in the following code is specific to the decimal package, which is not Databricks’ or Spark's data type.

 

 

Run the following code in a notebook.

from decimal import Decimal, getcontext

#replace the data with your values. These values are examples.
data = [
("1234567890123.12345678901234567890123456789012345678",),
("9876543210987.98765432109876543210987654321098765432",)

# Create the RDD directly (bypassing DataFrame)
rdd = spark.sparkContext.parallelize(data)

# Convert strings to Decimal and reduce to sum
decimal_sum = rdd.map( lambda x: Decimal (x[0])). reduce( lambda a, b: a + b)

# Print the result
print ("High-precision Decimal Sum:", decimal_sum)

 

The sum of the two values provided as an example displays as the following output. 

11111111101111.11111111011111111101111111110111111110