CharConversionException when importing non UDF data from IBM Db2 to Databricks

Set driver charset configs.

Written by aimee.gonzalezcameron

Last published at: July 17th, 2025

Problem

When importing views from IBM Db2 using Apache Spark, you encounter the following error in the Spark driver logs or job failure details.

Caught java.io.CharConversionException ERRORCODE=-4220, SQLSTATE=null

 

Cause

The IBM Db2 JCC (JDBC) driver expects character column data to conform to the database's UTF-8 code page. If any column contains invalid or malformed UTF-8 byte sequences (for example, data with characters beyond the valid Unicode range or incorrectly encoded), the driver throws a SqlException wrapping a java.io.CharConversionException.

 

Example stack trace

org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by:com.ibm.db2.jcc.am.SqlException: [jcc][t4][XX][XX][X.X.X] Caught java.io.CharConversionException. See attached Throwable for details. ERRORCODE=-4220,SQLSTATE=null at com.ibm.db2.jcc.am.fd.a(fd.java:731)

 

Solution

To handle extended or invalid characters more gracefully, configure your cluster with the following Spark configurations so they apply to all notebooks and jobs on that cluster. These settings modify the Db2 JCC driver’s behavior to tolerate character encoding issues without failing the entire query.

spark.driver.extraJavaOptions -Ddb2.jcc.charsetDecoderEncoder=3
spark.executor.extraJavaOptions -Ddb2.jcc.charsetDecoderEncoder=3

 

For details on how to apply Spark configs, refer to the “Spark configuration” section of the Compute configuration reference (AWSAzureGCP) documentation.

 

For additional reference on shared Db2 driver properties, see IBM’s JDBC throws java.io.CharConversionException documentation.