Delta Lake write jobs sometimes fail with the following exception:
java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.putIfAbsent(path: Path, content: InputStream). DBFS v1 doesn't support transactional writes from multiple clusters. Please upgrade to DBFS v2. Or you can disable multi-cluster writes by setting 'spark.databricks.delta.multiClusterWrites.enabled' to 'false'. If this is disabled, writes to a single table must originate from a single cluster.
Delta Lake multi-cluster writes are only supported with DBFS v2. Databricks clusters use DBFS v2 by default. All sparkSession objects use DBFS v2.
However, if the application uses the FileSystem API and calls FileSystem.close(), the file system client falls back to the default value, which is v1. In this case, Delta Lake multi-cluster write operations fail.
The following log trace shows that the file system object fell back to the default v1 version.
<date> <time> INFO DBFS: Initialized DBFS with DBFSV1 as the delegate.
There are two approaches to prevent this:
- Never call FileSystem.close() inside the application code. If it is necessary to call the close() API, then first instantiate a new FileSystemclient object with a configuration object from the current Apache Spark session, instead of an empty configuration object:
%scala val fileSystem = FileSystem.get(new java.net.URI(path), sparkSession.sessionState.newHadoopConf())
- Alternatively, this code sample achieves the same goal:
%scala val fileSystem = FileSystem.get(new java.net.URI(path), sc.hadoopConfiguration())