Problem: Job Fails When Using Spark-Avro to Write Decimal Values to AWS Redshift

Problem

In Databricks Runtime versions 5.x and above, when writing decimals to Amazon Redshift using Spark-Avro as the default temp file format, either the write operation fails with the exception:

Error (code 1207) while loading data into Redshift: "Invalid digit, Value '"', Pos 0, Type: Decimal"

or the write operation writes nulls in place of the decimal values.

Cause

When writing to Redshift, data is first stored in a temp folder in S3 before being loaded into Redshift. The default format used for storing temp data between Apache Spark and Redshift is Spark-Avro. However, Spark-Avro stores a decimal as a binary, which is interpreted by Redshift as empty strings or nulls.

Solution

Change the temp file format to CSV using the tempformat option. You can use this sample Scala code:

//Create sample data
case class createDec(value: BigDecimal)
val df = Seq(createDec(45.24)).toDS
//Write to Redshift
(df.write
  .format("com.databricks.spark.redshift")
  .option("url", jdbcUrl)
  .option("tempdir", tempDir)
  .option("dbtable", "testtable")
  .option("aws_iam_role", "your_aws_iam_role")
  .option("tempformat", "CSV")
  .mode("append")
  .save())