How to specify the DBFS path

When working with Databricks you will sometimes have to access the Databricks File System (DBFS).

Accessing files on DBFS is done with standard filesystem commands, however the syntax varies depending on the language or tool used.

For example, take the following DBFS path:

dbfs:/mnt/test_folder/test_folder1/

Apache Spark

Under Spark, you should specify the full path inside the Spark read command.

spark.read.parquet(“dbfs:/mnt/test_folder/test_folder1/file.parquet”)

DBUtils

When you are using DBUtils, the full DBFS path should be used, just like it is in Spark commands. The language specific formatting around the DBFS path differs depending on the language used.

%fs
ls dbfs:/mnt/test_folder/test_folder1/
dbutils.fs.ls(‘dbfs:/mnt/test_folder/test_folder1/’)
dbutils.fs.ls(“dbfs:/mnt/test_folder/test_folder1/”)

Note

Specifying dbfs: is not required when using DBUtils or Spark commands. The path dbfs:/mnt/test_folder/test_folder1/ is equivalent to /mnt/test_folder/test_folder1/.

Shell commands

Shell commands do not recognize the DFBS path. Instead, DBFS and the files within, are accessed with the same syntax as any other folder on the file system.

Bash

ls /dbfs/mnt/test_folder/test_folder1/
cat /dbfs/mnt/test_folder/test_folder1/file_name.txt

Python

import os
os.listdir('/dbfs/mnt/test_folder/test_folder1/’)

Scala

import java.io.File
val directory = new File("/dbfs/mnt/test_folder/test_folder1/")
directory.listFiles