Cannot read Databricks objects stored in the DBFS root directory
Problem An Access Denied error returns when you attempt to read Databricks objects stored in the DBFS root directory in blob storage from outside a Databricks cluster. Cause This is normal behavior for the DBFS root directory. Databricks stores objects like libraries and other temporary system files in the DBFS root directory. Databricks is the only...
How to specify the DBFS path
When working with Databricks you will sometimes have to access the Databricks File System (DBFS). Accessing files on DBFS is done with standard filesystem commands, however the syntax varies depending on the language or tool used. For example, take the following DBFS path: dbfs:/mnt/test_folder/test_folder1/ Apache Spark Under Spark, you should spec...
Parallelize filesystem operations
When you need to speed up copy and move operations, parallelizing them is usually a good option. You can use Apache Spark to parallelize operations on executors. On Databricks you can use DBUtils APIs, however these API calls are meant for use on driver nodes, and shouldn’t be used on Spark jobs running on executors. In this article, we are going to...