Problem
Databricks SQL uses cloud fetch to increase query performance. This is done by default.
Instead of using single threaded queries, cloud fetch retrieves data in parallel from cloud storage buckets (such as AWS S3 and Azure Data Lake Storage). Compared to a standard, single threaded fetch, you can see up to a 10X increase in performance using cloud fetch.
If you are seeing slowness when fetching results in Databricks SQL it is likely that cloud fetch is disabled.
The following symptoms indicate an issue with cloud fetch:
- Slowness when retrieving results over ODBC/JDBC
- Your BI tools frequently get fetch time-outs while waiting for query results
- The SQL warehouse query editor is slow
Causes
Some common issues that can result in cloud fetch being disabled:
- Using an ODBC driver version below 2.6.17
- Using a JDBC driver version below 2.6.18
- Firewall/ACL issues between your workspace and cloud storage
- Cloud provider versioning is enabled on the cloud storage you are using
Solution
- Ensure you are using a Databricks ODBC driver version 2.6.17 or above.
- Ensure you are using a Databricks JDBC driver version 2.6.18 or above.
- Ensure your ODBC/JDBC Authentication requirements (AWS | Azure | GCP) are properly configured.
- Disable storage bucket versioning on the cloud storage you are using to store your data.