Delta Lake write jobs sometimes fail with the following exception:
java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.putIfAbsent(path: Path, content: InputStream). DBFS v1 doesn't support transactional writes from multiple clusters. Please upgrade to DBFS v2. Or you can disable multi-cluster writes by setting 'spark.databricks.delta.multiClusterWrites.enabled' to 'false'. If this is disabled, writes to a single table must originate from a single cluster. Please check https://docs.databricks.com/delta/delta-intro.html#frequently-asked-questions-faq for more details.
Delta Lake multi-cluster writes are only supported with the DBFS
v2 client. Databricks clusters use DBFS
v2 by default. All
sparkSession objects use the
However, if the application uses the
FileSystem API and calls
FileSystem.close(), the file system client falls back to the default value, which is
v1. In this case, Delta Lake multi-cluster write operations fail.
The following log trace shows that the file system object fell back to the default
<date> <time> INFO DBFS: Initialized DBFS with DBFSV1 as the delegate.
There are two approaches to prevent this:
FileSystem.close()inside the application code. If it is necessary to call the
close()API, then first instantiate a new
FileSystemclient object with a configuration object from the current Apache Spark session, instead of an empty configuration object:
val fileSystem = FileSystem.get(new java.net.URI(path), sparkSession.sessionState.newHadoopConf())
Alternatively, this code sample achieves the same goal:
val fileSystem = FileSystem.get(new java.net.URI(path), sc.hadoopConfiguration())