Delta Lake write jobs sometimes fail with the following exception:
java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.putIfAbsent(path: Path, content: InputStream). DBFS v1 doesn't support transactional writes from multiple clusters. Please upgrade to DBFS v2. Or you can disable multi-cluster writes by setting 'spark.databricks.delta.multiClusterWrites.enabled' to 'false'. If this is disabled, writes to a single table must originate from a single cluster.
Delta Lake multi-cluster writes are only supported with DBFS v2. Databricks clusters use DBFS v2 by default. All
sparkSession objects use DBFS v2.
However, if the application uses the
FileSystem API and calls
FileSystem.close(), the file system client falls back to the default value, which is v1. In this case, Delta Lake multi-cluster write operations fail.
The following log trace shows that the file system object fell back to the default v1 version.
<date> <time> INFO DBFS: Initialized DBFS with DBFSV1 as the delegate.
There are two approaches to prevent this:
FileSystem.close()inside the application code. If it is necessary to call the
close()API, then first instantiate a new
FileSystemclient object with a configuration object from the current Apache Spark session, instead of an empty configuration object:
val fileSystem = FileSystem.get(new java.net.URI(path), sparkSession.sessionState.newHadoopConf())
Alternatively, this code sample achieves the same goal:
val fileSystem = FileSystem.get(new java.net.URI(path), sc.hadoopConfiguration())