Access S3 with temporary session credentials
You can use IAM session tokens with Hadoop config support to access S3 storage in Databricks Runtime 8.3 and above. Info You cannot mount the S3 path as a DBFS mount when using session credentials. You must use the S3A URI. Extract the session credentials from your cluster Extract the session credentials from your cluster. You will need the Instance...
Cannot use IAM roles with table ACL
Problem You want to use IAM roles when table ACLs are enabled, but you get an error saying credentials cannot be located. NoCredentialsError: Unable to locate credentials Cause When a table ACL is enabled, access to the EC2 instance metadata service is blocked. This is a security measure that prevents users from obtaining IAM access credentials. Sol...
Enable s3cmd for notebooks
s3cmd is a client library that allows you to perform all AWS S3 operations from any machine. s3cmd is not installed on Databricks clusters by default. You must install it via a cluster-scoped init script before it can be used. Info The sample init script stores the path to a secret in an environment variable. You should store secrets in this fashion...
How to check if a spark property is modifiable in a notebook
Problem You can tune applications by setting various configurations. Some configurations must be set at the cluster level, whereas some are set inside notebooks or applications. Solution To check if a particular Spark configuration can be set in a notebook, run the following command in a notebook cell: %scala spark.conf.isModifiable("spark.databrick...
JSON reader parses values as null
Problem You are attempting to read a JSON file. You know the file has data in it, but the Apache Spark JSON reader is returning a null value. Example code You can use this example code to reproduce the problem. Create a test JSON file in DBFS.%python dbutils.fs.rm("dbfs:/tmp/json/parse_test.txt") dbutils.fs.put("dbfs:/tmp/json/parse_test.txt", """ {...
Common errors in notebooks
There are some common issues that occur when using notebooks. This section outlines some of the frequently asked questions and best practices that you should follow. Spark job fails with java.lang.NoClassDefFoundError Sometimes you may come across an error like: %scala java.lang.NoClassDefFoundError: Could not initialize class line.....$read$ This c...
display() does not show microseconds correctly
Problem You want to display a timestamp value with microsecond precision, but when you use display() it does not show the value past milliseconds. For example, this Apache Spark SQL display() command: %sql display(spark.sql("select cast('2021-08-10T09:08:56.740436' as timestamp) as test")) Returns a truncated value: 2021-08-10T09:08:56.740+0000 Caus...
Error: Received command c on object id p0
Problem You have imported Python libraries, but when you try to execute Python code in a notebook you get a repeating message as output. INFO:py4j.java_gateway:Received command c on object id p0 INFO:py4j.java_gateway:Received command c on object id p0 INFO:py4j.java_gateway:Received command c on object id p0 INFO:py4j.java_gateway:Received command ...
Failure when accessing or mounting storage
Problem You are trying to access an existing mount point, or create a new mount point, and it fails with an error message. Invalid Mount Exception:The backend could not get tokens for path /mnt. Cause The root mount path (/mnt) is also mounted to a storage location. You can verify that something is mounted to the root path by listing all mount point...
Item was too large to export
Problem You are trying to export notebooks using the workspace UI and are getting an error message. This item was too large to export. Try exporting smaller or fewer items. Cause The notebook files are larger than 10 MB in size. Solution The simplest solution is to limit the size of the notebook or folder that you are trying to download to 10 MB or ...
Access notebooks owned by a deleted user
When you remove a user (AWS | Azure) from Databricks, a special backup folder is created in the workspace. This backup folder contains all of the deleted user’s content. Backup folders appear in the workspace as <deleted username>-backup-#. Info Only an admin user can access a backup folder. To access a backup folder: Log into Databricks as an...
Notebook autosave fails due to file size limits
Problem Notebook autosaving fails with the following error message: Failed to save revision: Notebook size exceeds limit. This is most commonly caused by cells with large results. Remove some cells or split the notebook. Cause The maximum notebook size allowed for autosaving is 8 MB. Solution First, check the size of your notebook file using your br...
How to send email or SMS messages from Databricks notebooks
You may need to send a notification to a set of recipients from a Databricks notebook. For example, you may want to send email based on matching business rules or based on a command’s success or failure. This article describes two approaches to sending email or SMS messages from a notebook. Both examples use Python notebooks: Send email or SMS messa...
Cannot run notebook commands after canceling streaming cell
Problem After you cancel a running streaming cell in a notebook attached to a Databricks Runtime 5.0 cluster, you cannot run any subsequent commands in the notebook. The commands are left in the “waiting to run” state, and you must clear the notebook’s state or detach and reattach the cluster before you can successfully run commands on the notebook....
Troubleshooting unresponsive Python notebooks or canceled commands
This article provides an overview of troubleshooting steps you can take if a notebook is unresponsive or cancels commands. Check metastore connectivity Problem Simple commands in newly-attached notebooks fail, but succeed in notebooks that were attached to the same cluster earlier. Troubleshooting steps Check metastore connectivity. The inability to...
Update job permissions for multiple users
When you are running jobs, you might want to update user permissions for multiple users. You can do this by using the Databricks job permissions API (AWS | Azure | GCP) and a bit of Python code. Instructions Copy the example code into a notebook. Enter the <job-id> (or multiple job ids) into the array arr[]. Enter your payload{}. In this examp...