This article provides information to help you troubleshoot the connection between your Databricks JDBC/ODBC server and BI tools and data sources.
Fetching result set is slow after statement execution
After a query execution, you can fetch result rows by calling the next() method on the returned ResultSet repeatedly. This method triggers a request to the driver Thrift server to fetch a batch of rows back if the buffered ones are exhausted. We found the size of the batch significantly affects the performance. The default value in the most of the JDBC/ODBC drivers is too conservative, and we recommend that you set it to at least 100,000. Contact the BI tool provider if you cannot access this configuration.
Timeout/Exception when creating the connection
Once you have the server hostname, you can run the following tests from a terminal to check for connectivity to the warehouse.
curl https://<server-hostname>:<port>/sql/protocolv1/o/0/<cluster-id> -H "Authorization: Basic $(echo -n 'token:<personal-access-token>' | base64)"
If the connection times out, check whether your network settings of the connection are correct.
If the response contains a TTransportException (the error is expected) like the following, it means that the gateway is functioning properly and you have passed in valid credentials. If you are not able to connect with the same credentials, check that the client you are using is properly configured and is using the latest Simba drivers (version >= 2.6.22):
<h2>HTTP ERROR: 500</h2> <p>Problem accessing /cliservice. Reason: <pre> javax.servlet.ServletException: org.apache.thrift.transport.TTransportException</pre></p>
Referencing temporary views
If the response contains the message Table or view not found: SPARK..temp_view it means that a temporary view is not properly referenced in the client application. Simba has an internal configuration parameter called UseNativeQuery that decides whether the query is translated or not before being submitted to the Thrift server. By default, the parameter is set to 0, in which case Simba can modify the query. In particular, Simba creates a custom #temp schema for temporary views and it expects the client application to reference a temporary view with this schema. You can avoid using this special alias by setting UseNativeQuery=1, which prevents Simba from modifying the query. In this case, Simba sends the query directly to the Thrift server. However, the client needs to make sure that the queries are written in the dialect that Spark expects, that is, HiveQL.
To sum up, you have the following options to handle temporary views over Simba and Spark:
- UseNativeQuery=0 and reference the view by prefixing its name with #temp.
- UseNativeQuery=1 and make sure the query is written in the dialect that Spark expects.
- If you get the error 401 Unauthorized, check the credentials you are using:
If you use a personal access token to authenticate, verify that the username is token (not your username) and the password is a personal access token (the token should start with dapi).
<h2>HTTP ERROR: 401</h2> <p>Problem accessing /sql/protocolv1/o/0/test-cluster. Reason: <pre> Unauthorized</pre></p>
- Responses such as 404, Not Foundusually indicate problems with locating the specified cluster:
<h2>HTTP ERROR: 404</h2> <p>Problem accessing /sql/protocolv1/o/0/missing-cluster. Reason: <pre> RESOURCE_DOES_NOT_EXIST: No cluster found matching: missing-cluster</pre></p>
- If you see the following errors in your application log4j logs:
You can ignore these errors. The Simba internal log4j library is shaded to avoid conflicts with the log4j library in your application. However, Simba may still load the log4j configuration of your application, and attempt to use some custom log4j appenders. This attempt fails with the shaded library. Relevant information is still captured in the logs.
log4j:ERROR A "org.apache.log4j.FileAppender" object is not assignable to a "com.simba.spark.jdbc42.internal.apache.log4j.Appender" variable.