Problem
You have an Autoloader job configured in Directory listing mode and are encountering a failure with a URISyntaxException
error.
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: [masked_uri]
Cause
The error message indicates an issue with the URI (Uniform Resource Identifier) used in the Autoloader job configuration. This problem occurs when the source folder contains files with names containing colons (":
"). The Autoloader, in Directory listing mode, relies on the Hadoop library for file listing. Filenames with colons violate the library's naming limitations.
For more information on these limitations please review the Hadoop Documentation.
The Apache community has acknowledged this issue in HDFS-14762.
Solution
You have a few options to work around this naming limitation. Choose the most appropriate resolution based on your specific use case and requirements.
- Avoid filenames with colons: Ensure that filenames within the source path do not contain colons to comply with the Hadoop library naming constraints.
- Switch to File notification mode: Transition from Directory listing mode to File notification mode in Autoloader.
-
Disable incremental listing: If Directory listing mode is required, disable incremental listing by setting
option("cloudFiles.useIncrementalListing", "false")
onreadStream
. Note that this may degrade read performance. -
Clear the checkpoint (temporary mitigation): If the issue is infrequent, clearing the checkpoint may provide temporary relief. This should be considered a last resort,as it may result in duplicate processing in stateful streaming queries.