Delete table when underlying S3 bucket is deleted

Do not delete the contents of a S3 bucket before dropping a table that stores data in the bucket.

Written by Jose Gonzalez

Last published at: May 31st, 2022

Problem

You are trying to drop or alter a table when you get an error.

Error in SQL statement: IOException: Bucket_name … does not exist

You can reproduce the error with a DROP TABLE or ALTER TABLE command.

%sql

DROP TABLE <database-name.table-name>;
%sql

ALTER TABLE <database-name.table-name> SET LOCATION "<file-system-location>";

Cause

You deleted the contents of the underlying S3 bucket before dropping the tables.

Because the data no longer exists, you get an error when trying to drop the table.

Solution

You can use spark.sessionState.catalog.externalCatalog.dropTable to delete the table.

%scala
import org.apache.spark.sql.hive.HiveUtils
spark.sessionState.catalog.externalCatalog.dropTable("<database-name>", "<table-name>", ignoreIfNotExists = false, purge = false)