Problem
Auto Loader jobs fail with an error message indicating that a metadata file, which contains important default options for the stream, is missing from the checkpoint directory.
`com.databricks.sql.cloudfiles.errors.CloudFilesIllegalStateException: The metadata file in the streaming source checkpoint directory is missing.`
Cause
The metadata file that the Auto Loader job relies on is in a checkpoint directory with a lifecycle policy enabled. When the lifecycle policy deletes the metadata file upon expiry, the Auto Loader job fails.
Solution
If available, recover the missing metadata file from a backup or create a new one using the following Apache Spark configuration. This configuration enables the Auto Loader job to proceed and generate a new metadata file.
spark.conf.set(”spark.databricks.cloudFiles.missingMetadataFile.writeNew”,”true”)
After recovering or recreating, ensure the checkpoint location is not in a bucket with lifecycle policies enabled. If it is, please move the checkpoint location to a bucket without a lifecycle policy and restart the stream.
Databricks also recommends regularly reviewing and updating S3 bucket lifecycle policies to ensure they don't interfere with critical files.