Databricks supports using external metastores instead of the default Hive metastore.
You can export all table metadata from Hive to the external metastore.
- Use the Apache Spark Catalog API to list the tables in the databases contained in the metastore.
- Use the SHOW CREATE TABLE statement to generate the DDLs and store them in a file.
- Use the file to import the table DDLs into the external metastore.
The following code accomplishes the first two steps.
%python dbs = spark.catalog.listDatabases() for db in dbs: f = open("your_file_name_{}.ddl".format(db.name), "w") tables = spark.catalog.listTables(db.name) for t in tables: DDL = spark.sql("SHOW CREATE TABLE {}.{}".format(db.name, t.name)) f.write(DDL.first()[0]) f.write("\n") f.close()
You can use the resulting file to import the table DDLs into the external metastore.