COPY INTO command failing on partition columns with STRING data types that start with an integer

Disable partition column type inference.

Written by shubham.bhusate

Last published at: January 22nd, 2025

Problem

You’re using a Databricks notebook try to transfer data from a source to a sink location, particularly in the context of Delta tables. When you use the COPY INTO command on a column designated for partitioning that has data type STRING and begins with a numeric value, you receive an error. For example, a column “Date” has data type STRING and has a value starting with a number, 2025-01-01. 
 

ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.unsafe.types.UTF8String

 

Cause

The file index logic infers the partition column types by default. A partition column with data formatted “yyyy-mm-dd“ is inferred into DateType, and then casting StringType to DateType generates an integer. When the Parquet reader tries to get a UTF8String partition value and finds an integer, it fails with an error. 

 

Solution

In the same notebook, disable the partition column type inference while reading data using the following configuration.

 

SET spark.sql.sources.partitionColumnTypeInference.enabled = false