Problem
When using Auto Loader in file notification mode, Auto Loader does not ingest your files to the target location as expected. This may happen even in situations where the referenced cloud queue service (AWS SQS, Azure Queue Storage, or Google Pub/Sub) contains messages with valid object storage URIs you expect Auto Loader to ingest. You receive WARN
messages in log4j.
The following is an example from AWS SQS.
WARN S3Event: Ignoring unexpected message received in SQS queue
Example configuration
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("cloudFiles.useNotifications", True)
.option("cloudFiles.queueUrl", "https://<cloud-queue-url>")
.option("<other-cloudFiles-options>","<other-values>")
...
Cause
Messages in your cloud queue service do not conform to the expected format. Messages that don’t comply with Auto Loader's expectations cause warnings like WARN S3Event: Ignoring unexpected message received in SQS queue
for AWS.
Solution
Ensure that your cloud queue messages conform to the expected format of the service consuming them.
- When using Auto Loader in AWS, the message format should comply with the
ObjectCreated
events in the AWS SQS Queue. - Auto Loader on Azure expects messages that relate to the
FlushWithClose
events. - GCP expects messages that relate to the
OBJECT_FINALIZE
event.
If the messages conform to the expected format, the Auto Loader stream should then progress as expected. For further information on Auto Loader File Notification mode, refer to the What is Auto Loader file notification mode? (AWS | Azure | GCP) documentation.