InvalidSchemaException error when trying to insert data into a Delta table

Define a field type for any fields that use a StructType within a StructField.

Written by lucas.rocha

Last published at: January 30th, 2025

Problem

When inserting data into a Delta table with a schema that contains a StructField of type NULL, you encounter an InvalidSchemaException

 

Example Error Message

Job aborted due to stage failure: Task 0 in stage 25.0 failed 4 times, most recent failure: Lost task 0.3 in stage 25.0 (TID 22) (10.101.191.43 executor 0): org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: optional group <field-name> {}

 

Cause

Empty STRUCT fields are not permitted in Parquet format. 

The issue arises when a StructField is defined with an empty StructType. In the following example, the col3 field is defined as a STRUCT with no fields. 

 

from pyspark.sql.types import StructType, StructField, FloatType
schema = StructType([
    StructField("col1", FloatType(), nullable=True),
    StructField("col2", FloatType(), nullable=True),
    StructField("col3", StructType([]), nullable=True)
])

 

Solution

Define a field type for any fields that use a StructType within a StructField

 

Example

schema = StructType([
    StructField("col1", FloatType(), nullable=True),
    StructField("col2", FloatType(), nullable=True),
    StructField("col3", StructType([StructField("nested_col",
StringType())]), nullable=True)
])

 

For more information, refer to the What is a view? (AWSAzureGCP) documentation.