Problem
You’re executing a Delta streaming task in Delta Lake and encounter an error.
TypeError: cannot unpack non-iterable NoneType object.
When reviewing the error stack, you see your requested schema contains special columns that can only be produced by VectorizedParquetRecordReader
and data types which that reader doesn’t support.
Cause
Parquet doesn't support NullType
, and when you request to cast the NullType
column, you’re asking the Delta reader to read a column from a Parquet file. The code fails at the reading stage before reaching the casting operation.
Solution
During data reading, replace NullType
with a literal None
value and then cast it to StringType
. This approach ensures that the NullType
column is replaced with a valid data type (StringType
) so further processing can proceed.
from pyspark.sql.functions import lit
streaming_df = streaming_df.withColumn('<column-containing-NullType>', lit(None).cast(StringType()))