Autoscaling is slow with an external metastore
Problem You have an external metastore configured on your cluster and autoscaling is enabled, but the cluster is not autoscaling effectively. Cause You are copying the metastore jars to every executor, when they are only needed in the driver. It takes time to initialize and run the jars every time a new executor spins up. As a result, adding more ex...
Data too long for column error
Problem You are trying to insert a struct into a table, but you get a java.sql.SQLException: Data too long for column error. Caused by: java.sql.SQLException: Data too long for column 'TYPE_NAME' at row 1 Query is: INSERT INTO COLUMNS_V2 (CD_ID,COMMENT,`COLUMN_NAME`,TYPE_NAME,INTEGER_IDX) VALUES (?,?,?,?,?) , parameters [103182,<null>,'address...
Drop database without deletion
By default, the DROP DATABASE (AWS | Azure | GCP) command drops the database and deletes the directory associated with the database from the file system. Sometimes you may want to drop the database, but keep the underlying database directory intact. Example code You can use this example code to drop the database without dropping the underlying stora...
How to create table DDLs to import into an external metastore
Databricks supports using external metastores instead of the default Hive metastore. You can export all table metadata from Hive to the external metastore. Use the Apache Spark Catalog API to list the tables in the databases contained in the metastore. Use the SHOW CREATE TABLE statement to generate the DDLs and store them in a file. Use the file to...
Drop tables with corrupted metadata from the metastore
Problem Sometimes you cannot drop a table from the Databricks UI. Using %sql or spark.sql to drop table doesn’t work either. Cause The metadata (table schema) stored in the metastore is corrupted. When you run Drop table command, Spark checks whether table exists or not before dropping the table. Since the metadata is corrupted for the table Spark c...
Error in CREATE TABLE with external Hive metastore
Problem You are connecting to an external MySQL metastore and attempting to create a table when you get an error. AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:An exception was thrown while adding/validating class(es) : (conn=21) Column length too big for column 'PARAM_VALUE' (max = 16383); use BLOB or TE...
AnalysisException when dropping table on Azure-backed metastore
Problem When you try to drop a table in an external Hive version 2.0 or 2.1 metastore that is deployed on Azure SQL Database, Databricks throws the following exception: com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(...
How to troubleshoot several Apache Hive metastore problems
Problem 1: External metastore tables not available When you inspect the driver logs, you see a stack trace that includes the error Required table missing: WARN Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MDatabase and subclasses resulted in no possible candidates Required table missing: "DBS" in Catalog "" Schema "". DataNu...
Listing table names
Problem To fetch all the table names from metastore you can use either spark.catalog.listTables() or %sql show tables. If you observe the duration to fetch the details you can see spark.catalog.listTables() usually takes longer than %sql show tables. Cause spark.catalog.listTables() tries to fetch every table’s metadata first and then show the reque...
How to set up an embedded Apache Hive metastore
You can set up a Databricks cluster to use an embedded metastore. You can use an embedded metastore when you only need to retain table metadata during the life of the cluster. If the cluster is restarted, the metadata is lost. If you need to persist the table metadata or other data after a cluster restart, then you should use the default metastore o...
Japanese character support in external metastore
Problem You are trying to use Japanese characters in your tables, but keep getting errors. Create a table with the OPTIONS keyword OPTIONS provides extra metadata to the table. You try creating a table with OPTIONS and specify the charset as utf8mb4. %sql CREATE TABLE default.JPN_COLUMN_NAMES('作成年月' string ,'計上年月' string ,'所属コード' string ,'生保代理店コード_8...
Parquet timestamp requires Hive metastore 1.2 or above
Problem You are trying to create a Parquet table using TIMESTAMP, but you get an error message. Error in SQL statement: QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.UnsupportedOperationException: Parquet does not support timestamp. See HIVE-6384 Example code %sql CREATE EXTERN...