Auto Loader (file notification mode) fails to identify new files from the cloud queue service

Ensure that messages in the cloud queue are of the expected format.

Written by brock.baurer

Last published at: March 12th, 2025

Problem

When using Auto Loader in file notification mode, Auto Loader does not ingest your files to the target location as expected. This may happen even in situations where the referenced cloud queue service (AWS SQS, Azure Queue Storage, or Google Pub/Sub) contains messages with valid object storage URIs you expect Auto Loader to ingest. You receive WARN messages in log4j. 

The following is an example from AWS SQS. 

WARN S3Event: Ignoring unexpected message received in SQS queue

 

Example configuration

spark.readStream.format("cloudFiles")
  .option("cloudFiles.format", "json")
  .option("cloudFiles.useNotifications", True)
  .option("cloudFiles.queueUrl", "https://<cloud-queue-url>")
  .option("<other-cloudFiles-options>","<other-values>")
  ...

 

Cause

Messages in your cloud queue service do not conform to the expected format. Messages that don’t comply with Auto Loader's expectations cause warnings like WARN S3Event: Ignoring unexpected message received in SQS queue for AWS.

 

Solution

Ensure that your cloud queue messages conform to the expected format of the service consuming them.

  • When using Auto Loader in AWS, the message format should comply with the ObjectCreated events in the AWS SQS Queue. 
  • Auto Loader on Azure expects messages that relate to the FlushWithClose events. 
  • GCP expects messages that relate to the OBJECT_FINALIZE event. 

 

If the messages conform to the expected format, the Auto Loader stream should then progress as expected.  For further information on Auto Loader File Notification mode, refer to the What is Auto Loader file notification mode? (AWSAzureGCP) documentation.