Drop Tables with Corrupted Metadata from the Metastore

This article explains how to drop tables with corrupted metadata.

Problem

Sometimes you cannot drop a table from the Databricks UI. Using %sql or spark.sql to drop table doesn’t work either.

Cause

The metadata (table schema) stored in the metastore is corrupted. When you run Drop table command, Spark checks whether table exists or not before dropping the table. Since the metadata is corrupted for the table Spark can’t drop the table and fails with following exception.

com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: The metadata is corrupted

Solution

Use a Hive client to drop the table since the Hive client doesn’t check for the table existence as Spark does. To drop a table:

  1. Create a function inside Hive package.
package org.apache.spark.sql.hive {
import org.apache.spark.sql.hive.HiveUtils
import org.apache.spark.SparkContext

object utils {
    def dropTable(sc: SparkContext, dbName: String, tableName: String, ignoreIfNotExists: Boolean, purge: Boolean): Unit = {
      HiveUtils
          .newClientForMetadata(sc.getConf, sc.hadoopConfiguration)
          .dropTable(dbName, tableName, ignoreIfNotExists, false)
    }
  }
}
  1. Drop corrupted tables.
import org.apache.spark.sql.hive.utils
utils.dropTable(sc, "default", "my_table", true, true)