Trouble reading external JDBC tables after upgrading from Databricks Runtime 5.5
Problem
Attempting to read external tables via JDBC works fine on Databricks Runtime 5.5, but the same table reads fail on Databricks Runtime 6.0 and above.
You see an error similar to the following:
com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: org.apache.spark.sql.jdbc does not allow user-specified schemas.
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at java.lang.Thread.run(Thread.java:748)
.
Caused by: org.apache.spark.sql.AnalysisException: org.apache.spark.sql.jdbc does not allow user-specified schemas.;
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
Cause
Databricks Runtime 5.5 and below infers the session_id
attribute as a smallint
. Databricks Runtime 6.0 and above infers the session_id
attribute as an int
.
This change to the session_id
attribute causes queries to fail with a schema issue.
Solution
If you are using external tables that were created in Databricks Runtime 5.5 and below in Databricks Runtime 6.0 and above, you must set the Apache Spark configuration spark.sql.legacy.mssqlserver.numericMapping.enabled
to true
. This ensures that Databricks Runtime 6.0 and above infers the session_id
attribute as a smallint
.
- Open the Clusters page.
- Select a cluster.
- Click Edit.
- Click Advanced Options.
- Click Spark.
- In the Spark Config field, enter
spark.sql.legacy.mssqlserver.numericMapping.enabled true
. - Save the change and start, or restart, the cluster.