Auto Loader streaming query failure with unknownFieldException error

Use schema evolution to avoid streaming query failures when new columns are added to your data.

Written by harikrishnan.kunhumveettil

Last published at: February 29th, 2024

Problem

Your Auto Loader streaming job fails with an UnknownFieldException error when a new column is added to the source file of the stream.

Exception: org.apache.spark.sql.catalyst.util.UnknownFieldException: Encountered unknown field(s) during parsing: <column name>

Cause

An UnknownFieldException error occurs when Auto Loader detects the addition of new columns as it processes incoming data.

The addition of a new column causes the stream to stop and generates an UnknownFieldException error.  

Solution

Set your Auto Loader stream to use schema evolution to avoid this issue.

For more information, review the How does Auto Loader schema evolution work? (AWS | Azure | GCP) documentation.