Problem
When establishing a connection to an Oracle database through an external Hive metastore, you receive an error message, “exceeded simultaneous SESSIONS_PER_USER limit”
.
You may notice a high number of Databricks compute resources establishing sessions to the external Hive metastore, persisting connections even when the cluster scales down to a minimal configuration, and/or connections remaining active until a cluster is terminated.
Cause
You have more simultaneous sessions per user than the Oracle database allows.
Solution
Increase the SESSIONS_PER_USER
limit on your Oracle database. Oracle recommends the same action. To increase the session limit per user, execute the below query.
ALTER PROFILE your_profile_name LIMIT SESSIONS_PER_USER new_limit;
If increasing the session limit on your Oracle database directly is not possible, use an Apache Spark cluster configuration to set your pool size.
spark.databricks.hive.metastore.client.pool.size {max_numer_sessions}.
Important
Limiting the connection pool size may impact performance, especially in multithreaded situations.
Note
When possible, Databricks recommends using Unity Catalog as the primary metadata store for your workloads. This can help reduce the dependency on the external Hive metastore and minimize the number of connections required. For more information, review the What is Unity Catalog? (AWS | Azure | GCP ) documentation.
Additional preventative measures
- If you are using Databricks Runtime versions below 10.4 LTS, and have the Spark setting
Spark.databricks.hive.metastore.client.pool.type
you are likely using BoneCP. Consider using HikariCP for connection pooling instead. To set up HikariCP, use the Spark settingspark.databricks.hive.metastore.client.pool.type HikariCP
. HikariCP is the default option as of Databricks Runtime 10.4 LTS. For more information, review the Databricks Runtime 10.4 LTS (AWS | Azure | GCP) documentation. - Regularly monitor the number of connections to the external Hive metastore and terminate idle connections. This can help reduce resource contention and improve the overall performance of the system.
- Implement a cluster management strategy that terminates clusters when they are no longer in use. This can help reduce the number of active connections to the external Hive metastore and free up resources for other workloads.