Problem 1: External metastore tables not available
When you inspect the driver logs, you see a stack trace that includes the error Required table missing:
WARN Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MDatabase and subclasses resulted in no possible candidates Required table missing: "DBS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables" org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : "DBS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables" at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:33 85)
Cause
The database is present, but there are no metastore tables.
Solution
If the external metastore version is Hive 2.0 or above, use the Hive Schema Tool to create the metastore tables. For versions below Hive 2.0, add the metastore tables with the following configurations in your existing init script:
spark.hadoop.datanucleus.autoCreateSchema=true spark.hadoop.datanucleus.fixedDatastore=false
You can also set these configurations in the Apache Spark config (AWS | Azure) directly:
datanucleus.autoCreateSchema true datanucleus.fixedDatastore false
Problem 2: Hive metastore verification failed
When you inspect the driver logs, you see a stack trace that includes an error like the following:
18/09/24 14:51:07 ERROR RetryingHMSHandler: HMSHandler Fatal error: MetaException(message:Version information not found in metastore. ) at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore .java:7564) at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore. java:7542) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Cause
The VERSION table in the metastore is empty.
Solution
Do one of the following:
- Populate the VERSION table with the correct version values using an INSERT query.
- Set the following configurations to turn off the metastore verification in the Spark configuration of the cluster:
hive.metastore.schema.verification false hive.metastore.schema.verification.record.version false
Problem 3: Metastore connection limit exceeded
Commands run on the cluster fail with the following stack trace in the driver logs:
Unable to open a test connection to the given database. JDBC url = jdbc:<jdbcURL>?trustServerCertificate=true&useSS L=true, username = <REDACTED>. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------ java.sql.SQLSyntaxErrorException: User '<userId>' has exceeded the 'max_user_connections' resource (current value: 100) at org.mariadb.jdbc.internal.util.exceptions.Except ionMapper.get(ExceptionMapper.java:163) at org.mariadb.jdbc.internal.util.exceptions.Except ionMapper.getException(ExceptionMapper.java:106) at org.mariadb.jdbc.internal.protocol.AbstractConne ctProtocol.connectWithoutProxy(AbstractConnectPr otocol.java:1036)
Cause
The metastore configuration allows only 100 connections. When the connection limit is reached, new connections are not allowed, and commands fail with this error. Each cluster in the Databricks workspace establishes a connection with the metastore. If you have a large number of clusters running, then this issue can occur. Additionally, incorrect configurations can cause a connection leak, causing the number of connections to keep increasing until the limit is reached.
Solution
Correct the problem with one of the following actions:
- If you are using an external metastore and you have a large number of clusters running, then increase the connection limit on your external metastore.
- If you are not using an external metastore, ensure that you do not have any custom Hive metastore configurations on your cluster. When using the metastore provided by Databricks, you should use the default configurations on the cluster for the Hive metastore.
- If you are using the default configuration and still encounter this issue, contact Databricks Support. Depending on the configuration of your Databricks workspace, it might be possible to increase the number of connections allowed to the internal metastore.
Problem 4: Table actions fail because column has too much metadata
When the quantity of metadata for a single column exceeds 4000 characters, table actions fail with an error like this:
Error in SQL statement: IllegalArgumentException: Error: type expected at the position 3998 of 'struct<num_ad_accounts:bigint,num_benchmarks:bigint,num_days_range:string,num_days_in_history:string,num_fb_pages:bigint,num_g_profiles:bigint,num_ga_views:bigint,num_groups:bigint,num_ig_profiles:bigint,num_li_pages:bigint,num_labels:string,num_labels_added:bigint,num_labels_
Cause
This is a bug that was fixed in Hive Metastore version 2.3.0 (HIVE-12274). Databricks uses an earlier version of Hive Metastore (version 0.13), so this bug occurs when there is too much metadata for a column, such as an imported JSON schema.
Solution
As a workaround, set up an external Hive metastore (AWS | Azure) that uses version 2.3.0 or above. Then delete the existing table with the following command:
%scala spark.sessionState .catalog .externalCatalog .dropTable("default", "test_table_tabledrop_1", ignoreIfNotExists = false, purge = false)