Duplicate columns in the metadata error

Problem

Your Apache Spark job is processing a Delta table when the job fails with an error message.

org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the metadata update: col1, col2...

Cause

There are duplicate column names in the Delta table. Column names that differ only by case are considered duplicate.

Delta Lake is case preserving, but case insensitive, when storing a schema.

Parquet is case sensitive when storing and returning column information.

Spark can be case sensitive, but it is case insensitive by default.

In order to avoid potential data corruption or data loss, duplicate column names are not allowed.

Solution

Delta tables must not contain duplicate column names.

Ensure that all column names are unique.