Problem
You’re using a Databricks notebook try to transfer data from a source to a sink location, particularly in the context of Delta tables. When you use the COPY INTO
command on a column designated for partitioning that has data type STRING
and begins with a numeric value, you receive an error. For example, a column “Date” has data type STRING
and has a value starting with a number, 2025-01-01.
ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.unsafe.types.UTF8String
Cause
The file index logic infers the partition column types by default. A partition column with data formatted “yyyy-mm-dd“ is inferred into DateType, and then casting StringType to DateType generates an integer. When the Parquet reader tries to get a UTF8String partition value and finds an integer, it fails with an error.
Solution
In the same notebook, disable the partition column type inference while reading data using the following configuration.
SET spark.sql.sources.partitionColumnTypeInference.enabled = false