How to check if a spark property is modifiable in a notebook
Problem You can tune applications by setting various configurations. Some configurations must be set at the cluster level, whereas some are set inside notebooks or applications. Solution To check if a particular Spark configuration can be set in a notebook, run the following command in a notebook cell: %scala spark.conf.isModifiable("spark.databrick...
JSON reader parses values as null
Problem You are attempting to read a JSON file. You know the file has data in it, but the Apache Spark JSON reader is returning a null value. Example code You can use this example code to reproduce the problem. Create a test JSON file in DBFS.%python dbutils.fs.rm("dbfs:/tmp/json/parse_test.txt") dbutils.fs.put("dbfs:/tmp/json/parse_test.txt", """ {...
Common errors in notebooks
There are some common issues that occur when using notebooks. This section outlines some of the frequently asked questions and best practices that you should follow. Spark job fails with java.lang.NoClassDefFoundError Sometimes you may come across an error like: %scala java.lang.NoClassDefFoundError: Could not initialize class line.....$read$ This c...
display() does not show microseconds correctly
Problem You want to display a timestamp value with microsecond precision, but when you use display() it does not show the value past milliseconds. For example, this Apache Spark SQL display() command: %sql display(spark.sql("select cast('2021-08-10T09:08:56.740436' as timestamp) as test")) Returns a truncated value: 2021-08-10T09:08:56.740+0000 Caus...
Error: Received command c on object id p0
Problem You have imported Python libraries, but when you try to execute Python code in a notebook you get a repeating message as output. INFO:py4j.java_gateway:Received command c on object id p0 INFO:py4j.java_gateway:Received command c on object id p0 INFO:py4j.java_gateway:Received command c on object id p0 INFO:py4j.java_gateway:Received command ...
Failure when accessing or mounting storage
Problem You are trying to access an existing mount point, or create a new mount point, and it fails with an error message. Invalid Mount Exception:The backend could not get tokens for path /mnt. Cause The root mount path (/mnt) is also mounted to a storage location. You can verify that something is mounted to the root path by listing all mount point...
Item was too large to export
Problem You are trying to export notebooks using the workspace UI and are getting an error message. This item was too large to export. Try exporting smaller or fewer items. Cause The notebook files are larger than 10 MB in size. Solution The simplest solution is to limit the size of the notebook or folder that you are trying to download to 10 MB or ...
Access notebooks owned by a deleted user
When you remove a user (AWS | Azure) from Databricks, a special backup folder is created in the workspace. This backup folder contains all of the deleted user’s content. Backup folders appear in the workspace as <deleted username>-backup-#. Info Only an admin user can access a backup folder. To access a backup folder: Log into Databricks as an...
Notebook autosave fails due to file size limits
Problem Notebook autosaving fails with the following error message: Failed to save revision: Notebook size exceeds limit. This is most commonly caused by cells with large results. Remove some cells or split the notebook. Cause The maximum notebook size allowed for autosaving is 8 MB. Solution First, check the size of your notebook file using your br...
Cannot run notebook commands after canceling streaming cell
Problem After you cancel a running streaming cell in a notebook attached to a Databricks Runtime 5.0 cluster, you cannot run any subsequent commands in the notebook. The commands are left in the “waiting to run” state, and you must clear the notebook’s state or detach and reattach the cluster before you can successfully run commands on the notebook....
Troubleshooting unresponsive Python notebooks or canceled commands
This article provides an overview of troubleshooting steps you can take if a notebook is unresponsive or cancels commands. Check metastore connectivity Problem Simple commands in newly-attached notebooks fail, but succeed in notebooks that were attached to the same cluster earlier. Troubleshooting steps Check metastore connectivity. The inability to...
Update job permissions for multiple users
When you are running jobs, you might want to update user permissions for multiple users. You can do this by using the Databricks job permissions API (AWS | Azure | GCP) and a bit of Python code. Instructions Copy the example code into a notebook. Enter the <job-id> (or multiple job ids) into the array arr[]. Enter your payload{}. In this examp...
Generate browser HAR files
When troubleshooting UI issues, it is sometimes necessary to obtain additional information about the network requests that are generated in your browser. If this is needed, our support team will ask you to generate a HAR file. This article describes how to generate a HAR file with each of the major web browsers. Warning HAR files contain sensitive d...