Updated May 16th, 2022 by John.Lourdu

Python command fails with AssertionError: wrong color format

Problem You run a Python notebook and it fails with an AssertionError: wrong color format message. An example stack trace:   File "/local_disk0/tmp/1599775649524-0/PythonShell.py", line 39, in <module>     from IPython.nbconvert.filters.ansi import ansi2html   File "<frozen importlib._bootstrap>", line 983, in _find_and_load   File "<...

1 min reading time
Updated May 16th, 2022 by John.Lourdu

Access notebooks owned by a deleted user

When you remove a user (AWS | Azure) from Databricks, a special backup folder is created in the workspace. This backup folder contains all of the deleted user’s content. Backup folders appear in the workspace as <deleted username>-backup-#. Info Only an admin user can access a backup folder. To access a backup folder: Log into Databricks as an...

0 min reading time
Updated October 31st, 2022 by John.Lourdu

Use audit logs to identify who deleted a cluster

By default, all-purpose cluster configurations are deleted 30 days after the cluster was last terminated. It is possible to keep a cluster configuration for longer than 30 days if an administrator pins the cluster. In either situation, it is possible for an administrator to manually delete a cluster configuration at any time. If you try to run a job...

1 min reading time
Updated May 11th, 2022 by John.Lourdu

Install Turbodbc via init script

Turbodbc is a Python module that uses the ODBC interface to access relational databases. It has dependencies on libboost-all-dev, unixodbc-dev, and python-dev packages, which need to be installed in order. You can install these manually, or you can use an init script to automate the install. Create the init script Run this sample script in a noteboo...

0 min reading time
Updated February 24th, 2023 by John.Lourdu

Add custom tags to a Delta Live Tables pipeline

When managing Delta Live Tables pipelines on your clusters, you may want to use custom tags for internal tracking. For example, you may want to use tags to allocate cost across different departments. Or your organization might have a global cluster policy that requires tags on the instances. Failure to comply with a cluster policy can result in clus...

0 min reading time
Updated January 20th, 2023 by John.Lourdu

SSL exception when connecting to GCP secret manager

Info This article applies to clusters using Databricks Runtime 7.3 LTS and 9.1 LTS.  Problem Secrets stored in the GCP secret manager service can be retrieved using the google-cloud-secret-manager client library. Your code may fail with an SSLHandshakeException error message on Databricks Runtime 9.1 LTS and below. Sample code: import com.google.clo...

1 min reading time
Updated January 20th, 2023 by John.Lourdu

SQL access control error when using Snowflake as a data source

Problem The Snowflake Connector for Spark is used to read data from, and write data to, Snowflake while working in Databricks. The connector makes Snowflake look like another Spark data source. When you try to query Snowflake, your get a SnowflakeSQLException error message. SnowflakeSQLException: SQL access control error: Insufficient privileges to ...

0 min reading time
Updated April 11th, 2023 by John.Lourdu

SYNC command fails with a mismatched input error

Problem The SYNC command can be used to migrate legacy external Apache Hive tables to Unity Catalog. When you run SYNC in a Databricks notebook, it fails with a mismatched input 'schema' expecting 'MATERIALIZED' error. SYNC schema <target UC catalog>.<target UC schema> from hive_metastore.<hive schema> DRY RUN com.databricks.backen...

0 min reading time
Updated October 28th, 2022 by John.Lourdu

Error when creating a user, group, or service principal at the account level with Terraform

Problem Unity Catalog uses Databricks account identities to resolve users, service principals, and groups, and to enforce permissions. These identities can be managed using Terraform. You are trying to create users, service principals, or groups at the account level when your Terraform code fails with a set `host` property error message. 2022-10-06T...

1 min reading time
Updated November 30th, 2022 by John.Lourdu

Reading a table fails due to AAD token timeout on ADLS Gen2

Problem Access to ADLS Gen2 storage can be configured using OAuth 2.0 with an Azure service principal. You can securely access data in an Azure storage account using OAuth 2.0 with an Azure Active Directory (Azure AD) application service principal for authentication. You are trying to access external tables (tables stored outside of the root storage...

3 min reading time
Updated March 4th, 2022 by John.Lourdu

Retrieve queries owned by a disabled user

When a Databricks SQL user is removed from an organization, the queries owned by the user remain, but they are only visible to those who already have permission to access them. A Databricks SQL admin can transfer ownership to other users, as well as delete alerts, dashboards, and queries owned by the disabled user account. Clone a query A Databricks...

0 min reading time
Updated February 24th, 2023 by John.Lourdu

Unable to access Delta Sharing tables with a Python client

Problem Delta Sharing is a platform independent open protocol that is used to securely share data with other organizations. When using an open sharing model, recipients can access shared data in a read-only format using the delta-sharing Python library. When trying to access a shared table using any Python client, you get an SSLCertVerificationError...

1 min reading time
Updated January 20th, 2023 by John.Lourdu

Overlapping paths error when querying both Hive and Unity Catalog tables

Problem You are running queries when you get an overlapping paths error message. org.apache.spark.SparkException: Your query is attempting to access overlapping paths through multiple authorization mechanisms, which is not currently supported. Cause An overlapping paths error happens when a single cell in a notebook queries both an Apache Hive table...

0 min reading time
Updated January 11th, 2023 by John.Lourdu

Permission denied error when creating external location

Problem An external location is a storage location, such as an S3 bucket, on which external tables or managed tables can be created. A user or group with permission to use an external location can access any storage path within the external location without direct access to the storage credential. Review the Manage external locations and storage cre...

0 min reading time
Updated January 20th, 2023 by John.Lourdu

Search audit logs for connections from prohibited IP addresses

IP access lists can be used to restrict access to Databricks based on known network locations. Once enabled, an IP access list requires uses to login from an allowed address. If a user attempts to login from any IP address not on the access list, the login is denied. Review the IP access list documentation for more details. Best practices involve pe...

1 min reading time
Updated May 10th, 2023 by John.Lourdu

Unable to access Unity Catalog views

Problem A user is trying to access a view in Unity Catalog when it fails with a Table '<view-name>' does not have sufficient privilege to execute error message. Error in SQL statement: AnalysisException: Table '<view-name>' does not have sufficient privilege to execute. Cause The owner of the view does not have sufficient privileges on t...

0 min reading time
Updated May 9th, 2022 by John.Lourdu

Failed to create process error with Databricks CLI in Windows

Problem While trying to access the Databricks CLI (AWS | Azure | GCP) in Windows, you get a failed to create process error message. Cause This can happen: If multiple instances of the Databricks CLI are installed on the system. If the Python path on your Windows system includes a space. Info There is a known issue in pip which causes pip installed s...

0 min reading time
Load More