Databricks Knowledge Base

Main Navigation

  • Help Center
  • Documentation
  • Knowledge Base
  • Community
  • Training
  • Feedback

Notebooks (AWS)

These articles can help you with your Databricks notebooks.

16 Articles in this category

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request. We’ll get back to you as soon as possible.

Please enter the details of your request. A member of our support staff will respond as soon as possible.

  • Home
  • Amazon
  • Notebooks (AWS)

Access S3 with temporary session credentials

You can use IAM session tokens with Hadoop config support to access S3 storage in Databricks Runtime 8.3 and above. Info You cannot mount the S3 path as a DBFS mount when using session credentials. You must use the S3A URI. Extract the session credentials from your cluster Extract the session credentials from your cluster. You will need the Instance...

Last updated: May 16th, 2022 by Gobinath.Viswanathan

Cannot use IAM roles with table ACL

Problem You want to use IAM roles when table ACLs are enabled, but you get an error saying credentials cannot be located. NoCredentialsError: Unable to locate credentials Cause When a table ACL is enabled, access to the EC2 instance metadata service is blocked. This is a security measure that prevents users from obtaining IAM access credentials. Sol...

Last updated: May 16th, 2022 by Adam Pavlacka

Enable s3cmd for notebooks

s3cmd is a client library that allows you to perform all AWS S3 operations from any machine. s3cmd is not installed on Databricks clusters by default. You must install it via a cluster-scoped init script before it can be used. Info The sample init script stores the path to a secret in an environment variable. You should store secrets in this fashion...

Last updated: May 16th, 2022 by pavan.kumarchalamcharla

How to check if a spark property is modifiable in a notebook

Problem You can tune applications by setting various configurations. Some configurations must be set at the cluster level, whereas some are set inside notebooks or applications. Solution To check if a particular Spark configuration can be set in a notebook, run the following command in a notebook cell: %scala spark.conf.isModifiable("spark.databrick...

Last updated: May 16th, 2022 by Adam Pavlacka

JSON reader parses values as null

Problem You are attempting to read a JSON file. You know the file has data in it, but the Apache Spark JSON reader is returning a null value. Example code You can use this example code to reproduce the problem. Create a test JSON file in DBFS.%python dbutils.fs.rm("dbfs:/tmp/json/parse_test.txt") dbutils.fs.put("dbfs:/tmp/json/parse_test.txt", """ {...

Last updated: May 16th, 2022 by saritha.shivakumar

Common errors in notebooks

There are some common issues that occur when using notebooks. This section outlines some of the frequently asked questions and best practices that you should follow. Spark job fails with java.lang.NoClassDefFoundError Sometimes you may come across an error like: %scala java.lang.NoClassDefFoundError: Could not initialize class line.....$read$ This c...

Last updated: May 16th, 2022 by Adam Pavlacka

display() does not show microseconds correctly

Problem You want to display a timestamp value with microsecond precision, but when you use display() it does not show the value past milliseconds. For example, this Apache Spark SQL display() command: %sql display(spark.sql("select cast('2021-08-10T09:08:56.740436' as timestamp) as test")) Returns a truncated value: 2021-08-10T09:08:56.740+0000 Caus...

Last updated: May 16th, 2022 by harikrishnan.kunhumveettil

Error: Received command c on object id p0

Problem You have imported Python libraries, but when you try to execute Python code in a notebook you get a repeating message as output. INFO:py4j.java_gateway:Received command c on object id p0 INFO:py4j.java_gateway:Received command c on object id p0 INFO:py4j.java_gateway:Received command c on object id p0 INFO:py4j.java_gateway:Received command ...

Last updated: May 16th, 2022 by sandeep.chandran

Failure when accessing or mounting storage

Problem You are trying to access an existing mount point, or create a new mount point, and it fails with an error message. Invalid Mount Exception:The backend could not get tokens for path /mnt. Cause The root mount path (/mnt) is also mounted to a storage location. You can verify that something is mounted to the root path by listing all mount point...

Last updated: May 16th, 2022 by kiran.bharathi

Item was too large to export

Problem You are trying to export notebooks using the workspace UI and are getting an error message. This item was too large to export. Try exporting smaller or fewer items. Cause The notebook files are larger than 10 MB in size. Solution The simplest solution is to limit the size of the notebook or folder that you are trying to download to 10 MB or ...

Last updated: May 16th, 2022 by pavan.kumarchalamcharla

Access notebooks owned by a deleted user

When you remove a user (AWS | Azure) from Databricks, a special backup folder is created in the workspace. This backup folder contains all of the deleted user’s content. Backup folders appear in the workspace as <deleted username>-backup-#. Info Only an admin user can access a backup folder. To access a backup folder: Log into Databricks as an...

Last updated: May 16th, 2022 by John.Lourdu

Notebook autosave fails due to file size limits

Problem Notebook autosaving fails with the following error message: Failed to save revision: Notebook size exceeds limit. This is most commonly caused by cells with large results. Remove some cells or split the notebook. Cause The maximum notebook size allowed for autosaving is 8 MB. Solution First, check the size of your notebook file using your br...

Last updated: May 16th, 2022 by Adam Pavlacka

How to send email or SMS messages from Databricks notebooks

You may need to send a notification to a set of recipients from a Databricks notebook. For example, you may want to send email based on matching business rules or based on a command’s success or failure. This article describes two approaches to sending email or SMS messages from a notebook. Both examples use Python notebooks: Send email or SMS messa...

Last updated: May 17th, 2022 by Adam Pavlacka

Cannot run notebook commands after canceling streaming cell

Problem After you cancel a running streaming cell in a notebook attached to a Databricks Runtime 5.0 cluster, you cannot run any subsequent commands in the notebook. The commands are left in the “waiting to run” state, and you must clear the notebook’s state or detach and reattach the cluster before you can successfully run commands on the notebook....

Last updated: May 17th, 2022 by Adam Pavlacka

Troubleshooting unresponsive Python notebooks or canceled commands

This article provides an overview of troubleshooting steps you can take if a notebook is unresponsive or cancels commands. Check metastore connectivity Problem Simple commands in newly-attached notebooks fail, but succeed in notebooks that were attached to the same cluster earlier. Troubleshooting steps Check metastore connectivity. The inability to...

Last updated: May 17th, 2022 by Adam Pavlacka

Update job permissions for multiple users

When you are running jobs, you might want to update user permissions for multiple users. You can do this by using the Databricks job permissions API (AWS | Azure | GCP) and a bit of Python code. Instructions Copy the example code into a notebook. Enter the <job-id> (or multiple job ids) into the array arr[]. Enter your payload{}. In this examp...

Last updated: May 17th, 2022 by Atanu.Sarkar


© Databricks 2022. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Policy | Terms of Use

Definition by Author

0
0