When working with Databricks you will sometimes have to access the Databricks File System (DBFS).
Accessing files on DBFS is done with standard filesystem commands, however the syntax varies depending on the language or tool used.
For example, take the following DBFS path:
dbfs:/mnt/test_folder/test_folder1/
Apache Spark
Under Spark, you should specify the full path inside the Spark read command.
spark.read.parquet(“dbfs:/mnt/test_folder/test_folder1/file.parquet”)
DBUtils
When you are using DBUtils, the full DBFS path should be used, just like it is in Spark commands. The language specific formatting around the DBFS path differs depending on the language used.
Bash
%fs ls dbfs:/mnt/test_folder/test_folder1/
Python
%python dbutils.fs.ls(‘dbfs:/mnt/test_folder/test_folder1/’)
Scala
%scala dbutils.fs.ls(“dbfs:/mnt/test_folder/test_folder1/”)
Shell commands
Shell commands do not recognize the DFBS path. Instead, DBFS and the files within, are accessed with the same syntax as any other folder on the file system.
Bash
ls /dbfs/mnt/test_folder/test_folder1/ cat /dbfs/mnt/test_folder/test_folder1/file_name.txt
Python
import os os.listdir('/dbfs/mnt/test_folder/test_folder1/’)
Scala
import java.io.File val directory = new File("/dbfs/mnt/test_folder/test_folder1/") directory.listFiles