Stateful Streaming query failing with SparkSecurityException error after restarting on a shared cluster or serverless

Switch the cluster access mode.

Written by shanmugavel.chandrakasu

Last published at: February 4th, 2025

Problem

You have a stateful streaming query with dropDuplicates() or dropDuplicatesWithinWatermark() specified, which was previously running on a single user cluster or a No Isolation Shared cluster. When you try to restart the query on a shared cluster or serverless, the query fails. You receive the following error (the exact details depend on your context).  

 

org.apache.spark.SparkSecurityException: Could not verify permissions for logical node ~WriteToMicroBatchDataSourceV1 ForeachBatchSink, XXXX, [checkpointLocation=s3://XXXX], Append, 0

 

When you check the error stack trace, you see the following message. 

 

Caused by: org.apache.spark.sql.execution.streaming.state.InvalidUnsafeRowException: The streaming query failed by state format invalidation. The following reasons may cause this: 1. An old Spark version wrote the checkpoint that is incompatible with the current one; 2. Broken checkpoint files; 3. The query is changed among restart. For the first case, you can try to restart the application without checkpoint or use the legacy Spark version to process the streaming state.Error message is: Variable-length field validation error: field: XXXX,XXXX

 

Cause

The cluster mode change causes the issue when using Spark Connect in Databricks Runtime 15.4 LTS and above. There is known Spark Connect limitation on switching cluster modes on Stateful Streaming queries. For more information, review SPARK-49722

 

Solution

When you restart Stateful Streaming queries, use the same cluster access mode.