INSERT operation fails while trying to execute multiple concurrent INSERT or MERGE operations to append data

Make sure the isolation levels are correctly set or refactor to remove conflicts.

Written by caio.cominato

Last published at: December 12th, 2024

Problem

When trying to execute multiple concurrent INSERT or MERGE operations to append data to a table, the INSERT operation fails. 

 

ConcurrentAppendException Files were added to the root of the table by a concurrent update. Please try the operation again.
Conflicting commit:
{"timestamp":xxxxx,"userId":"xxxxx","userName":"xxxxx","operation":"WRITE","operationParameters":{"mode":Append,"statsOnLoad":false,"partitionBy":[]},"job":{"jobId":"xxxxx","jobName":"xxxxx","jobRunId":"xxxxx","runId":"xxxxx","jobOwnerId":"xxxxx","triggerType":"manual"},"notebook":{"notebookId":"xxxxx"},"clusterId":"xxxxx","readVersion":xxxxx,"isolationLevel":"WriteSerializable","isBlindAppend":false,"operationMetrics":{"numFiles":"xx","numOutputRows":"xx","numOutputBytes":"xxxxx"},"tags":{"restoresDeletedRows":"false"},"engineInfo":"Databricks-Runtime/13.3.x-aarch64-scala2.12","txnId":"xxxxx"}

 

Cause

When multiple operations try to access the same data at once, they interfere with each other. 

 

Solution

  1. Trace back the operation that concurred by following the Conflicting commit from the error message.
  2. Ensure that the isolation level is set appropriately for your use case. 
  3. When you expect concurrent inserts, modify relevant queries to set blindAppend to true.
  4. If you use other operations such as UPDATE, DELETE, MERGE INTO, or OPTIMIZE, consult the documentation for expected write conflicts. 

 

For more information, review the Isolation levels and write conflicts on Databricks (AWSAzureGCP) documentation.

 

Note

If you don’t expect concurrency to occur often, set up retries to check for conflicts during concurrent appends.