FileReadException when reading a Delta table

A FileReadException error occurs when you attempt to read from a Delta table. The underlying data has been deleted, or the storage blob was unmounted during a write.

Written by Adam Pavlacka

Last published at: February 23rd, 2023

Problem

You attempt to read a Delta table from mounted storage and get a FileReadException error.

FileReadException: Error while reading file abfss:REDACTED@REDACTED.dfs.core.windows.net/REDACTED/REDACTED/REDACTED/REDACTED/PARTITION=REDACTED/part-00042-0725ec45-5c32-412a-ab27-5bc88c058773.c000.snappy.parquet. A file referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement. For more information, see https://docs.microsoft.com/azure/databricks/delta/delta-intro#frequently-asked-questions
 Caused by: FileNotFoundException: Operation failed: 'The specified path does not exist.', 404, HEAD, https:// REDACTED.dfs.core.windows.net/ REDACTED/ REDACTED/REDACTED/REDACTED/PARTITION=REDACTED/part-00042-0725ec45-5c32-412a-ab27-5bc88c058773.c000.snappy.parquet?upn=false&action=getStatus&timeout=90

Cause

FileReadException errors occur when the underlying data does not exist. The most common cause is manual deletion.

If the underlying data was not manually deleted, the mount point for the storage blob was removed and recreated while the cluster was writing to the Delta table.

Delta Lake does not fail a table write if the location is removed while the data write is ongoing. Instead, a new folder is created in the default storage account of the workspace, with the same path as the removed mount. Data continues to be written in that location.

If the mount is recreated before the write operation is finished, and the Delta transaction logs are made available again, Delta updates the transaction logs and the write is considered successful. When this happens, data files written to the default storage account while the mount was deleted are not accessible, as the path currently references the mounted storage account location.

Delete

Info

You can use diagnostic logging to verify that a mount was removed. Query the DBFS table for mount and unmount events.

For example:

DatabricksDBFS
| where ActionName == "unmount" or ActionName == "mount"

Solution

You can restore the missing data in one of two ways.

  • Repair the Delta table and add the missing data back with a custom job.
  1. Use FSCK to repair the table.
    %sql
    
    FSCK REPAIR TABLE <table-name>
  2. Rewrite the missing data with a custom job. This option is a good choice if you can re-run the last job without risking duplicate data.
  • Manually recover the missing files.
  1. Verify that there are no active jobs reading or writing to the mounted storage account that contains the Delta table.
  2. Unmount the mount path. This allows you to access the /mnt/<path-to-table>directory in the default storage account.
    %python
    
    dbutils.fs.unmount("/mnt/<mount-containing-table>")
  3. Use dbutils.fs.mvto move the files located in the table path to a temporary location.
    %python
    
    dbutils.fs.mv("/mnt/<path-to-table>", "/tmp/tempLocation/", True))
  4. Recreate the mount point.
    %python
    
    dbutils.fs.mount(source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/", mount_point = "/mnt/<mount-containing-table>", extra_configs = configs)
    Review the Access Azure Data Lake Storage Gen2 and blob storage documentation for more information.
  5. Move the files from the temporary location to the updated Delta table path.
    %python
    
    dbutils.fs.mv("/tmp/tempLocation", "/mnt/<path-to-table>", True))

If any jobs are reading or writing to the mount point when you attempt a manual recovery you may cause the issue to reoccur. Verify that the mount is not in use before attempting a manual repair.

Best practices

  • Instruct users to get approval before unmounting a storage location.
  • If you must unmount a storage location, verify there are no jobs running on the cluster.
  • Use dbutils.fs.updateMount to update information about the mount. Do not use unmount and mount to update the mount.
  • Use diagnostic logging to identify any possible unmount issues.
  • Run production jobs only on job clusters which are not affected by temporary unmount commands while running, unless they run the dbutils.fs.refreshMounts command.
  • When running jobs on interactive clusters, add a verification step at the end of a job (such as a count) to check for missing data files. If any are missing an error is triggered immediately.