Problem
When trying to execute multiple concurrent INSERT
or MERGE
operations to append data to a table, the INSERT
operation fails.
ConcurrentAppendException Files were added to the root of the table by a concurrent update. Please try the operation again.
Conflicting commit:
{"timestamp":xxxxx,"userId":"xxxxx","userName":"xxxxx","operation":"WRITE","operationParameters":{"mode":Append,"statsOnLoad":false,"partitionBy":[]},"job":{"jobId":"xxxxx","jobName":"xxxxx","jobRunId":"xxxxx","runId":"xxxxx","jobOwnerId":"xxxxx","triggerType":"manual"},"notebook":{"notebookId":"xxxxx"},"clusterId":"xxxxx","readVersion":xxxxx,"isolationLevel":"WriteSerializable","isBlindAppend":false,"operationMetrics":{"numFiles":"xx","numOutputRows":"xx","numOutputBytes":"xxxxx"},"tags":{"restoresDeletedRows":"false"},"engineInfo":"Databricks-Runtime/13.3.x-aarch64-scala2.12","txnId":"xxxxx"}
Cause
When multiple operations try to access the same data at once, they interfere with each other.
Solution
- Trace back the operation that concurred by following the
Conflicting commit
from the error message. - Ensure that the isolation level is set appropriately for your use case.
- When you expect concurrent inserts, modify relevant queries to set
blindAppend
totrue
. - If you use other operations such as
UPDATE
,DELETE
,MERGE INTO
, orOPTIMIZE
, consult the documentation for expected write conflicts.
For more information, review the Isolation levels and write conflicts on Databricks (AWS | Azure | GCP) documentation.
Note
If you don’t expect concurrency to occur often, set up retries to check for conflicts during concurrent appends.