Problem
When your job attempts to read or write intermediary files, such as Excel files, in the Databricks File System (DBFS), you encounter a java.io.FileNotFoundException
error.
The error message may include paths with the workspace ID, such as /00000000000000/<file-store>/<your-project>/<your-spreadsheet>.xlsx
.
Cause
The job is attempting to access a file that has been deleted or is in the process of being deleted.
Incorrect or expired storage account authentication can also cause a similar file access issue. In such cases, the error message includes an unauthorized error.
Solution
Cache the DataFrame before performing write operations. This ensures that the data is fully loaded into memory before any file operations are attempted. Add the following line before the write operation.
df.cache().show()
Also, ensure that you are using a version of the com.crealytics.spark.excel
library compatible with your Databricks Runtime version. For Databricks Runtime 13.3, use the following Maven coordinate.
com.crealytics:spark-excel_2.12:3.4.1_0.20.4
For Maven versions for other Databricks Runtime versions, refer to Maven Central’s com.crealytics documentation.
Note
If possible, use CSV format for intermediary storage instead of Excel because Apache Spark has native support for CSV files.