Troubleshooting JDBC/ODBC access to Azure Data Lake Storage Gen2

Learn how to troubleshoot JDBC and ODBC access to Azure Data Lake Storage Gen2 from Databricks.

Written by Adam Pavlacka

Last published at: June 1st, 2022

Problem

Delete

Info

In general, you should use Databricks Runtime 5.2 and above, which include a built-in Azure Blob File System (ABFS) driver, when you want to access Azure Data Lake Storage Gen2 (ADLS Gen2). This article applies to users who are accessing ADLS Gen2 storage using JDBC/ODBC instead.

When you run a SQL query from a JDBC or ODBC client to access ADLS Gen2, the following error occurs:

com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException: No value for dfs.adls.oauth2.access.token.provider found in conf file.

18/10/23 21:03:28 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
java.util.concurrent.ExecutionException: java.io.IOException: There is no primary group for UGI (Basic token)chris.stevens+dbadmin (auth:SIMPLE)
  at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
  at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
  at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
  at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
  at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2344)
  at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2316)
  at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
  at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
  at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
  at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
  at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getCachedPlan(SessionCatalog.scala:158)
  at org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$readDataSourceTable(DataSourceStrategy.scala:257)
  at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:313)
  at
  at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
  at scala.collection.immutable.List.foldLeft(List.scala:84)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:87)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:79)

When you run the query from the SQL client, you get the following error:

An error occurred when executing the SQL command:
select * from test_databricks limit 50

[Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: com.google.common.util.concurrent.UncheckedExecutionException: com.databricks.backend.daemon.data.common.InvalidMountException: Error while using path /mnt/crm_gen2/phonecalls for resolving path '/phonecalls' within mount at '/mnt/crm_gen2'., Query: SELECT * FROM `default`.`test_databricks` `default_test_databricks` LIMIT 50. [SQL State=HY000, DB Errorcode=500051]

Warnings:
[Simba][SparkJDBCDriver](500100) Error getting table information from database.

Cause

The root cause is incorrect configuration settings to create a JDBC or ODBC connection to ABFS via ADLS Gen2, which cause queries to fail.

Solution

Set spark.hadoop.hive.server2.enable.doAs to false in the cluster configuration settings.