Problem
When you try to run a job on a dedicated compute, it fails with the following error.
SparkException: Job aborted due to stage failure: Total size of serialized results of 2817 tasks (4.0 GiB) is bigger than spark.driver.maxResultSize 4.0 GiB.
The job fails even after increasing the spark.driver.maxResultSize
and driver memory
to higher value.
Stacktrace
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at com.databricks.sql.execution.arrowcollect.RDDBatchCollector.runSparkJobs(RDDBatchCollector.scala:261)
at com.databricks.sql.execution.arrowcollect.RDDBatchCollector.collect(RDDBatchCollector.scala:347)
at com.databricks.sql.execution.arrowcollect.CloudStoreCollector$.hybridCollect(CloudStoreCollector.scala:159)
at com.databricks.sql.execution.arrowcollect.CloudStoreCollector$.hybridCollect(CloudStoreCollector.scala:206)
at org.apache.spark.sql.execution.qrc.CompressedHybridCloudStoreFormat.collect(cachedSparkResults.scala:170)
at org.apache.spark.sql.execution.qrc.CompressedHybridCloudStoreFormat.collect(cachedSparkResults.scala:160)
at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.processAsRemoteBatches(SparkConnectPlanExecution.scala:475)
at org.apache.spark.sql.connect.execution.SparkConnectPlanExecution.handlePlan(SparkConnectPlanExecution.scala:141)
Cause
When Fine-Grained Access Control (FGAC) is enabled, queries involving restricted data, such as those protected by row-level security, column masking, or secure views, are offloaded to serverless compute for enforcement. The resulting data must then be fully materialized and transferred back to the dedicated cluster’s driver.
When the query spans a large number of small partitions, Apache Spark triggers an optimized execution path where executors send results directly to the driver, which aggregates and uploads them to cloud storage. If the total serialized result exceeds Spark’s internal 4 GiB driver-side limit, the job fails deterministically, regardless of driver memory or spark.driver.maxResultSize
settings.
For details, refer to the Fine-grained access control on dedicated compute (AWS | Azure | GCP) documentation.
Solution
Execute queries involving FGAC on standard compute, where data filtering and access control enforcement are handled within the same compute environment.