ETL jobs failing in Databricks SQL when casting Oracle NUMBER columns to Delta tables

Store numeric values with precision greater than DECIMAL(38,8) as strings and use Python’s decimal module via UDFs for high-precision arithmetic.

Written by joel.robin

Last published at: May 27th, 2025

Problem

While working in a Databricks SQL environment, your ETL jobs fail when Oracle NUMBER columns with precision up to 40 digits are cast to DECIMAL(38,8) in Delta tables. 

 

Cause

The numeric value exceeds the representable limit for the specified precision and scale.

 

Databricks SQL strictly enforces the precision and scale limits of the DECIMAL type. The maximum value that can be stored in DECIMAL(38,8) must not exceed 999999999999999999999999999999.99999999 (30 digits before the decimal, and eight after the decimal). 

 

When the system encounters values with more than 38 digits of precision, it throws an error due to overflow. Overflow is common when ingesting Oracle NUMBER types with high precision.

 

Solution

To handle such large values without precision loss or overflow, store the affected columns as type STRING in the Delta table. 

 

Then, register a UDF using Python's decimal module to perform arithmetic with extended precision. This approach allows accurate computations on values exceeding DECIMAL(38,8) limits. You can modify the following example code for registering the UDF. 

```python
from decimal import Decimal, getcontext
from pyspark.sql.types import StringType

# Register the UDF for high-precision arithmetic
def udf_calculate_high_precision(col_a, col_b, col_c, col_d):
    getcontext().prec = 50
    a = Decimal(col_a)
    b = Decimal(col_b)
    c = Decimal(col_c)
    d = Decimal(col_d)
    result = (a - b) + (c - d)
    return str(result)

spark.udf.register("udf_calculate_high_precision", udf_calculate_high_precision, StringType())
```

 

The following code subsequently uses the registered UDF in SQL. 

```sql
SELECT
  udf_calculate_high_precision(
    t.test_col_a,
    t.test_col_b,
    t.test_col_c,
    t.test_col_d
  ) AS calculated_result
FROM test_db.test_table t
WHERE t.test_filter_col = 'test_value';
```

 

As an additional consideration, be sure to clearly document column transformations when shifting from Oracle to Delta Lake.