Delta Lake write job fails with java.lang.UnsupportedOperationException
Problem
Delta Lake write jobs sometimes fail with the following exception:
java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.putIfAbsent(path: Path, content: InputStream).
DBFS v1 doesn't support transactional writes from multiple clusters. Please upgrade to DBFS v2.
Or you can disable multi-cluster writes by setting 'spark.databricks.delta.multiClusterWrites.enabled' to 'false'.
If this is disabled, writes to a single table must originate from a single cluster.
Cause
Delta Lake multi-cluster writes are only supported with DBFS v2. Databricks clusters use DBFS v2 by default. All sparkSession
objects use DBFS v2.
However, if the application uses the FileSystem
API and calls FileSystem.close()
, the file system client falls back to the default value, which is v1. In this case, Delta Lake multi-cluster write operations fail.
The following log trace shows that the file system object fell back to the default v1 version.
<date> <time> INFO DBFS: Initialized DBFS with DBFSV1 as the delegate.
Solution
There are two approaches to prevent this:
Never call
FileSystem.close()
inside the application code. If it is necessary to call theclose()
API, then first instantiate a newFileSystem
client object with a configuration object from the current Apache Spark session, instead of an empty configuration object:val fileSystem = FileSystem.get(new java.net.URI(path), sparkSession.sessionState.newHadoopConf())
Alternatively, this code sample achieves the same goal:
val fileSystem = FileSystem.get(new java.net.URI(path), sc.hadoopConfiguration())