Job fails when using Spark-Avro to write decimal values to AWS Redshift

Learn how to resolve job failures when writing decimal values to AWS Redshift with Spark-Avro.

Written by Adam Pavlacka

Last published at: May 31st, 2022


In Databricks Runtime versions 5.x and above, when writing decimals to Amazon Redshift using Spark-Avro as the default temp file format, either the write operation fails with the exception:

Error (code 1207) while loading data into Redshift: "Invalid digit, Value '"', Pos 0, Type: Decimal"

or the write operation writes nulls in place of the decimal values.


When writing to Redshift, data is first stored in a temp folder in S3 before being loaded into Redshift. The default format used for storing temp data between Apache Spark and Redshift is Spark-Avro. However, Spark-Avro stores a decimal as a binary, which is interpreted by Redshift as empty strings or nulls.


Change the temp file format to CSV using the tempformat option. You can use this sample Scala code:


//Create sample data
case class createDec(value: BigDecimal)
val df = Seq(createDec(45.24)).toDS
//Write to Redshift
  .option("url", jdbcUrl)
  .option("tempdir", tempDir)
  .option("dbtable", "testtable")
  .option("aws_iam_role", "your_aws_iam_role")
  .option("tempformat", "CSV")

Was this article helpful?