FileReadException on DBFS mounted filesystem

Use dbutils.fs.refreshMounts() to refresh mount points before referencing a DBFS path in your Spark job.

Written by Gobinath.Viswanathan

Last published at: April 11th, 2023

Problem

Your Apache Spark jobs are failing with a FileReadException error when attempting to read files on DBFS (Databricks File System) mounted paths.

org.apache.spark.SparkException: Job aborted due to stage failure: Task x in stage y failed n times, most recent failure: Lost task 0.3 in stage 141.0 (TID 770) (x.y.z.z executor 0): com.databricks.sql.io.FileReadException: Error while reading file dbfs:/mnt/Cloudfolder/folder1/silver_table/part-00000-twerrx-abcd-4538-ae46-87041a4fxxxx-c000.snappy.parquet

Cause

A FileReadException error can occur when a job dynamically handles mounts. When a series of mounts and unmounts happen for the same path in the same workspace, the driver and the executors can end up with different paths at the same mount point, depending on when the cluster was started and/or the mount was initialized.

Solution

To prevent the error from occurring, you need to include the dbutils.fs.refreshMounts() command in your Spark job before you reference a DBFS path.

The dbutils.fs.refreshMounts() command refreshes the mount points in the current workspace. This ensures that the executors and driver have a current and consistent view of the mount, regardless of when the cluster was started and/or the mount was initialized.

For more information, please review the dbutils.fs.refreshMounts() documentation (AWS | Azure | GCP).

Delete

Info

If possible, you should refer to the direct path of the storage URI when running streaming jobs. If you have to use mounts, try to avoid repeated mounting and unmounting.