Updated March 4th, 2022 by pavan.kumarchalamcharla

S3 connection fails with "No role specified and no roles available"

Problem You are using Databricks Utilities (dbutils) to access a S3 bucket, but it fails with a No role specified and no roles available error. You have confirmed that the instance profile associated with the cluster has the permissions needed to access the S3 bucket. Unable to load AWS credentials from any provider in the chain: [com.databricks.bac...

0 min reading time
Updated May 11th, 2022 by pavan.kumarchalamcharla

Install PyGraphViz

PyGraphViz Python libraries are used to plot causal inference networks. If you try to install PyGraphViz as a standard library, it fails due to dependency errors. PyGraphViz has the following dependencies: python3-dev graphviz libgraphviz-dev pkg-config Install via notebook Install the dependencies with apt-get.%sh sudo apt-get install -y python3-de...

0 min reading time
Updated May 31st, 2022 by pavan.kumarchalamcharla

Revoke all user privileges

When user permissions are explicitly granted for individual tables and views, the selected user can access those tables and views even if they don’t have permission to access the underlying database. If you want to revoke a user’s access, you can do so with the REVOKE command. However, the REVOKE command is explicit, and is strictly scoped to the ob...

1 min reading time
Updated February 29th, 2024 by pavan.kumarchalamcharla

OpenSSL SSL_connect: SSL_ERROR_SYSCALL error

Problem You are trying to install third-party libraries via an init script. The init script attempts to download the libraries using curl or wget, but the download fails with an SSL error message. curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to <hostname>:443 Cause The OpenSSL SSL_connect: SSL_ERROR_SYSCALL error means that ...

1 min reading time
Updated May 16th, 2022 by pavan.kumarchalamcharla

Enable s3cmd for notebooks

s3cmd is a client library that allows you to perform all AWS S3 operations from any machine. s3cmd is not installed on Databricks clusters by default. You must install it via a cluster-scoped init script before it can be used. Info The sample init script stores the path to a secret in an environment variable. You should store secrets in this fashion...

0 min reading time
Updated April 10th, 2023 by pavan.kumarchalamcharla

Use tcpdump to create pcap files

If you want to analyze the network traffic between nodes on a specific cluster, you can install tcpdump on the cluster and use it to dump the network packet details to pcap files. The pcap files can then be downloaded to a local machine for analysis. Create the tcpdump init script Run this sample script in a notebook on the cluster to create the ini...

0 min reading time
Load More