Problem
When using Tableau to connect to Databricks and extract data from large tables, the extract refresh process fails with an error message.
Total size of serialized results of tasks is bigger than spark.driver.maxResultSize.
This issue can occur with any kind of compute, but comes up most often when using serverless compute because the spark.driver.maxResultSize
configuration cannot be changed in serverless.
Cause
The amount of data being extracted from the Databricks tables exceeds the default spark.driver.maxResultSize
limit. This can happen when ODBC connection parameters disable Cloud Fetch.
Solution
In your connection parameters, locate and delete the EnableQueryResultDownload=0
parameter to re-enable Cloud Fetch.
Preventative measures
Optimize your queries to retrieve only the necessary data from Databricks tables, reducing the amount of data transferred and processed.
If using classic compute, you can increase the value of spark.driver.maxResultSize
to a larger limit.
- Click on your compute, and navigate to the Advanced options section.
- Click to expand. Under the Spark tab, in the Spark config field, add the following config.
spark.driver.maxResultSize <size>
The value of <size>
depends on your driver size and the current value. To check the current value, open the Spark UI, navigate to the Environment tab, and search for the spark.driver.maxResultSize
configuration. When making this change, make sure to choose a value that is higher than the current value you see in the Spark UI.