Databricks updates Glue Data Catalog during read/write operations

Upgrade to Unity Catalog. As an interim solution, turn off delta.catalog.update in Databricks.

Written by caio.cominato

Last published at: October 15th, 2024

Problem

When using AWS Glue Data Catalog as your external Hive metastore and Delta tables, you may find that Databricks updates the Glue Data Catalog during read/write operations instead of treating it as read-only.

This can cause confusion if you are expecting Databricks to read off the Glue catalog in an immutable fashion. 

Info

This issue occurs only when you use Hive metastore. Databricks recommends upgrading to Unity Catalog.

 

Cause

delta.catalog.update is enabled by default in Databricks. When Databricks interacts with AWS Glue Data Catalog, it may trigger the creation of new table versions or cause Databricks to update the Hive metastore with the latest delta version.

Solution

You should upgrade to Unity Catalog as a long-term solution.

As an interim workaround, turn off delta.catalog.update in Databricks.

  1. Open your cluster configuration.
  2. Add the following line to the Apache Spark configuration setting: spark.databricks.delta.catalog.update.enabled false
  3. Restart the cluster to apply the changes.

You should also ensure that any operations performed on the tables do not unintentionally alter Delta properties that could trigger version changes.

For more information, please review the External Apache Hive metastore (legacy) documentation.