Problem
Attempting to read external tables via JDBC works fine on Databricks Runtime 5.5, but the same table reads fail on Databricks Runtime 6.0 and above.
You see an error similar to the following:
com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: org.apache.spark.sql.jdbc does not allow user-specified schemas. at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at java.lang.Thread.run(Thread.java:748) . Caused by: org.apache.spark.sql.AnalysisException: org.apache.spark.sql.jdbc does not allow user-specified schemas.; at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
Cause
Databricks Runtime 5.5 and below infers the session_id attribute as a smallint. Databricks Runtime 6.0 and above infers the session_id attribute as an int.
This change to the session_id attribute causes queries to fail with a schema issue.
Solution
If you are using external tables that were created in Databricks Runtime 5.5 and below in Databricks Runtime 6.0 and above, you must set the Apache Spark configuration spark.sql.legacy.mssqlserver.numericMapping.enabled to true. This ensures that Databricks Runtime 6.0 and above infers the session_id attribute as a smallint.
- Open the Clusters page.
- Select a cluster.
- Click Edit.
- Click Advanced Options.
- Click Spark.
- In the Spark config field, enter spark.sql.legacy.mssqlserver.numericMapping.enabled true.
- Save the change and start, or restart, the cluster.