Databricks jobs using AWS Glue Data Catalog failing due to inability to reach cluster drivers

Ensure the Databricks cluster's IAM role has necessary permissions to access AWS Glue Data Catalog, update the IAM policy, and restart the cluster.

Written by raphael.balogo

Last published at: October 15th, 2024

Problem

You are using AWS Glue Data Catalog as a metastore when you encounter a job failure in Databricks with the error Could not reach driver of cluster <cluster-id>. 

Important

Please note that AWS Glue Data Catalog as a metastore is no longer supported. Databricks recommends Unity Catalog instead. You can learn more about this change in the Databricks blog post, “Prepare Your Journey to Migrate from AWS Glue Data Catalog to Databricks Unity Catalog.”  

 

 

Cause

The driver REPL (Read-Eval-Print Loop) crashes due to missing permissions.

 

Solution

Verify the IAM role associated with the Databricks cluster. Ensure that the role has the necessary permissions to access the AWS Glue Data Catalog.

Update the IAM policy attached to the role to include the required permissions and restart the cluster to apply the changes. Then rerun the job to verify that the issue is resolved.

For more information, please review the Use AWS Glue Data Catalog as a metastore (legacy) documentation.

Note

As a general preventive measure, please regularly review and update your IAM policies to ensure they have complete permissions. Additionally, you can monitor job logs for any permission-related errors to proactively address.