writeStream/readStream leads to an error when the schema contains “NullType”

Replace NullType with a literal None value and then cast it to StringType.

Written by G Yashwanth Kiran

Last published at: December 24th, 2024

Problem

You’re executing a Delta streaming task in Delta Lake and encounter an error.

 

TypeError: cannot unpack non-iterable NoneType object.

 

When reviewing the error stack, you see your requested schema contains special columns that can only be produced by VectorizedParquetRecordReader and data types which that reader doesn’t support.

 

Cause

Parquet doesn't support NullType, and when you request to cast the NullType column, you’re asking the Delta reader to read a column from a Parquet file. The code fails at the reading stage before reaching the casting operation. 

 

Solution

During data reading, replace NullType with a literal None value and then cast it to StringType. This approach ensures that the NullType column is replaced with a valid data type (StringType) so further processing can proceed. 

 

from pyspark.sql.functions import lit

streaming_df = streaming_df.withColumn('<column-containing-NullType>', lit(None).cast(StringType()))