Updated May 16th, 2022 by pavan.kumarchalamcharla

Enable s3cmd for notebooks

s3cmd is a client library that allows you to perform all AWS S3 operations from any machine. s3cmd is not installed on Databricks clusters by default. You must install it via a cluster-scoped init script before it can be used. Info The sample init script stores the path to a secret in an environment variable. You should store secrets in this fashion...

0 min reading time
Updated May 11th, 2022 by pavan.kumarchalamcharla

Install PyGraphViz

PyGraphViz Python libraries are used to plot causal inference networks. If you try to install PyGraphViz as a standard library, it fails due to dependency errors. PyGraphViz has the following dependencies: python3-dev graphviz libgraphviz-dev pkg-config Install via notebook Install the dependencies with apt-get.%sh sudo apt-get install -y python3-de...

0 min reading time
Updated May 31st, 2022 by pavan.kumarchalamcharla

Revoke all user privileges

When user permissions are explicitly granted for individual tables and views, the selected user can access those tables and views even if they don’t have permission to access the underlying database. If you want to revoke a user’s access, you can do so with the REVOKE command. However, the REVOKE command is explicit, and is strictly scoped to the ob...

1 min reading time
Updated March 4th, 2022 by pavan.kumarchalamcharla

S3 connection fails with "No role specified and no roles available"

Problem You are using Databricks Utilities (dbutils) to access a S3 bucket, but it fails with a No role specified and no roles available error. You have confirmed that the instance profile associated with the cluster has the permissions needed to access the S3 bucket. Unable to load AWS credentials from any provider in the chain: [com.databricks.bac...

0 min reading time
Updated May 16th, 2022 by pavan.kumarchalamcharla

Item was too large to export

Problem You are trying to export notebooks using the workspace UI and are getting an error message. This item was too large to export. Try exporting smaller or fewer items. Cause The notebook files are larger than 10 MB in size. Solution The simplest solution is to limit the size of the notebook or folder that you are trying to download to 10 MB or ...

0 min reading time
Updated July 20th, 2022 by pavan.kumarchalamcharla

Use tcpdump to create pcap files

If you want to analyze the network traffic between nodes on a specific cluster, you can install tcpdump on the cluster and use it to dump the network packet details to pcap files. The pcap files can then be downloaded to a local machine for analysis. Create the tcpdump init script Run this sample script in a notebook on the cluster to create the ini...

0 min reading time
Load More