Problem
In Databricks Runtime versions 5.x and above, when writing decimals to Amazon Redshift using Spark-Avro as the default temp file format, either the write operation fails with the exception:
Error (code 1207) while loading data into Redshift: "Invalid digit, Value '"', Pos 0, Type: Decimal"
or the write operation writes nulls in place of the decimal values.
Cause
When writing to Redshift, data is first stored in a temp folder in S3 before being loaded into Redshift. The default format used for storing temp data between Apache Spark and Redshift is Spark-Avro. However, Spark-Avro stores a decimal as a binary, which is interpreted by Redshift as empty strings or nulls.
Solution
Change the temp file format to CSV using the tempformat option. You can use this sample Scala code:
%scala //Create sample data case class createDec(value: BigDecimal) val df = Seq(createDec(45.24)).toDS //Write to Redshift (df.write .format("com.databricks.spark.redshift") .option("url", jdbcUrl) .option("tempdir", tempDir) .option("dbtable", "testtable") .option("aws_iam_role", "your_aws_iam_role") .option("tempformat", "CSV") .mode("append") .save())