Autoloader job fails with a URISyntaxException error due to invalid characters in filenames

When using Directory listing mode you should not process files with colons in the filename.

Written by harikrishnan.kunhumveettil

Last published at: January 19th, 2024

Problem

You have an Autoloader job configured in Directory listing mode and are encountering a failure with a URISyntaxException error.

java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: [masked_uri]

Cause

The error message indicates an issue with the URI (Uniform Resource Identifier) used in the Autoloader job configuration. This problem occurs when the source folder contains files with names containing colons (":"). The Autoloader, in Directory listing mode, relies on the Hadoop library for file listing. Filenames with colons violate the library's naming limitations.

For more information on these limitations please review the Hadoop Documentation.

The Apache community has acknowledged this issue in HDFS-14762.

Solution

You have a few options to work around this naming limitation. Choose the most appropriate resolution based on your specific use case and requirements.

  •  Avoid filenames with colons: Ensure that filenames within the source path do not contain colons to comply with the Hadoop library naming constraints.
  • Switch to File notification mode: Transition from Directory listing mode to File notification mode in Autoloader.
  • Disable incremental listing: If Directory listing mode is required, disable incremental listing by setting option("cloudFiles.useIncrementalListing", "false") on readStream. Note that this may degrade read performance.
  • Clear the checkpoint (temporary mitigation): If the issue is infrequent, clearing the checkpoint may provide temporary relief. This should be considered a last resort,as it may result in duplicate processing in stateful streaming queries.
     
Was this article helpful?