Problem
You have a stateful streaming query with dropDuplicates()
or dropDuplicatesWithinWatermark()
specified, which was previously running on a single user cluster or a No Isolation Shared cluster. When you try to restart the query on a shared cluster or serverless, the query fails. You receive the following error (the exact details depend on your context).
org.apache.spark.SparkSecurityException: Could not verify permissions for logical node ~WriteToMicroBatchDataSourceV1 ForeachBatchSink, XXXX, [checkpointLocation=s3://XXXX], Append, 0
When you check the error stack trace, you see the following message.
Caused by: org.apache.spark.sql.execution.streaming.state.InvalidUnsafeRowException: The streaming query failed by state format invalidation. The following reasons may cause this: 1. An old Spark version wrote the checkpoint that is incompatible with the current one; 2. Broken checkpoint files; 3. The query is changed among restart. For the first case, you can try to restart the application without checkpoint or use the legacy Spark version to process the streaming state.Error message is: Variable-length field validation error: field: XXXX,XXXX
Cause
The cluster mode change causes the issue when using Spark Connect in Databricks Runtime 15.4 LTS and above. There is known Spark Connect limitation on switching cluster modes on Stateful Streaming queries. For more information, review SPARK-49722.
Solution
When you restart Stateful Streaming queries, use the same cluster access mode.