Drop tables with corrupted metadata from the metastore

Learn how to drop tables that contain corrupted metadata from a metastore.

Written by Adam Pavlacka

Last published at: May 16th, 2022

Problem

Sometimes you cannot drop a table from the Databricks UI. Using %sql or spark.sql to drop table doesn’t work either.

Cause

The metadata (table schema) stored in the metastore is corrupted. When you run Drop table command, Spark checks whether table exists or not before dropping the table. Since the metadata is corrupted for the table Spark can’t drop the table and fails with following exception.

%scala

com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: The metadata is corrupted

Solution

Use a Hive client to drop the table since the Hive client doesn’t check for the table existence as Spark does. To drop a table:

  1. Create a function inside Hive package.
    %scala
    
    package org.apache.spark.sql.hive {
    import org.apache.spark.sql.hive.HiveUtils
    import org.apache.spark.SparkContext
    
    object utils {
        def dropTable(sc: SparkContext, dbName: String, tableName: String, ignoreIfNotExists: Boolean, purge: Boolean): Unit = {
          HiveUtils
              .newClientForMetadata(sc.getConf, sc.hadoopConfiguration)
              .dropTable(dbName, tableName, ignoreIfNotExists, false)
        }
      }
    }
  2. Drop corrupted tables.
    %scala
    
    import org.apache.spark.sql.hive.utils
    utils.dropTable(sc, "default", "my_table", true, true)