Problem
When trying to write a Delta table in Delta Lake, you encounter an error indicating the data exceeds the char/varchar type length limitation.
com.databricks.sql.transaction.tahoe.schema.DeltaInvariantViolationException: [DELTA_EXCEED_CHAR_VARCHAR_LIMIT] Exceeds char/varchar type length limitation. Failed check: (expression)
Cause
There is a mismatch between the length of the data being processed and the defined length in the schema. In Databricks Runtime versions 11.3 LTS and above, strings are treated as fixed-length character types (CHAR
) with a maximum length of 255 characters.
When defining a schema with VARCHAR
or CHAR
types, Databricks Runtime strictly enforces the specified length constraints.
Solution
1. Navigate to the Databricks workspace and select the cluster where the notebook is running.
2. Click the Edit button to modify the cluster configuration.
3. Scroll down to the Spark Config section and add the following configuration.
spark.sql.legacy.charVarcharAsString true
4. Save the changes and restart the cluster.
Note
The configuration spark.sql.legacy.charVarcharAsString
in Apache Spark is used to control how CHAR
and VARCHAR
types are handled.
When this configuration is set to true, CHAR
and VARCHAR
types are treated as STRING
types in Spark. This can help avoid issues related to strict length limitations and padding that are typically associated with CHAR
and VARCHAR
types.