Problem
You notice when your number of tasks decreases, parallelism decreases, and the job runs more slowly. Conversely, when the number of tasks increases, parallelism increases, and the job runs faster.
Cause
You have adaptive parallelism enabled.
Adaptive parallelism allows fewer tasks to be planned based on the number of concurrent queries. When the dynamic changes happen multiple times on one job (for example, one query runs much longer than another), adaptive parallelism may not perform as optimally as expected.
Solution
Disable adaptive parallelism.
- Navigate to the cluster in question.
- In the cluster configuration page, click the Edit button.
- Scroll down to Advanced and click to expand.
- Click the Spark tab, and in the Spark config field add
spark.databricks.execution.adaptiveParallelism.enabled false
- Click the Save button at the bottom of the page to apply the change to the cluster settings.