Problem: AnalysisException When Dropping Table on Azure-backed Metastore

Problem

When you try to drop a table in an external Hive version 2.0 or 2.1 metastore that is deployed on Azure SQL Database, Databricks throws the following exception:

com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Exception thrown when executing query : SELECT 'org.apache.hadoop.hive.metastore.model.MStorageDescriptor' AS NUCLEUS_TYPE,A0.INPUT_FORMAT,A0.IS_COMPRESSED,A0.IS_STOREDASSUBDIRECTORIES,A0.LOCATION,A0.NUM_BUCKETS,A0.OUTPUT_FORMAT,A0.SD_ID FROM SDS A0 WHERE A0.CD_ID = ? OFFSET 0 ROWS FETCH NEXT ROW ONLY );
  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:107)
  at org.apache.spark.sql.hive.HiveExternalCatalog.doDropTable(HiveExternalCatalog.scala:483)
  at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.dropTable(ExternalCatalog.scala:122)
  at org.apache.spark.sql.catalyst.catalog.SessionCatalog.dropTable(SessionCatalog.scala:638)
  at org.apache.spark.sql.execution.command.DropTableCommand.run(ddl.scala:212)

Cause

This is a known Hive bug (HIVE-14698), caused by another known bug with the datanucleus-rdbms module in the package. It is fixed in datanucleus-rdbms 4.1.16. However, Hive 2.0 and 2.1 metastores use version 4.1.7 and these versions are affected.

Solution

Do one of the following:

  • Upgrade the Hive metastore to version 2.3.0. This also resolves problems due to any other Hive bug that is fixed in version 2.3.0.
  • Import the following notebook to your workspace and follow the instructions to replace the datanucleus-rdbms JAR. This notebook is written to upgrade the metastore to version 2.1.1. You might want to have a similar version in your server side.

External metastore upgrade notebook