[CONCURRENT_QUERY] Error on Auto Loader job

Set the Auto Loader job to be configured to run in "continuous" mode instead of "available now" mode.

Written by Guilherme Leite

Last published at: January 31st, 2025

Problem

While running an Auto Loader job based on file triggering, which watches a storage location, you encounter a concurrent query error.

 

org.apache.spark.SparkConcurrentModificationException: [CONCURRENT_QUERY] Another instance of this query [id: <QUERY_ID>] was just started by a concurrent session [existing runId: <RUN_ID> new runId: <NEW_RUN_ID>].

 

Cause

Multiple instances of the same streaming query are running concurrently on the same cluster. 

When an Auto Loader job is configured to run in available now mode, it triggers a new job instance for every new file in the storage location.

Whenever a new file arrives while the previous job instance is still running, it causes a conflict, resulting in a [CONCURRENT_QUERY] error.

 

Solution

Configure the Auto Loader job to run in continuous mode instead of available now mode. This ensures that the job processes new files as they arrive, without triggering a new job instance for each file.

If your specific situation only allows the job to run when new files arrive, you have to implement backpressure handling mechanisms in your streaming job, such as rate limiting, windowing, or batching, to ensure that the job can handle the rate of data ingestion.