Problem: Part Number Must be Between 1 and 10000 Inclusive

Problem

When you copy a large file from the local file system to DBFS on S3, the following exception can occur:

Amazon.S3.AmazonS3Exception: Part number must be an integer between 1 and 10000, inclusive

Cause

This is an S3 limit on segment count. Part files can only be numbered from 1 to 10000, inclusive.

Solution

To prevent this exception from occurring, increase the size of each part file. Set the following property at the cluster level or notebook level.

  • Cluster Level: you must restart the cluster after setting this property.

    spark.hadoop.fs.s3a.multipart.size 104857600
    
  • Notebook Level:

    spark.conf.set("spark.hadoop.fs.s3a.multipart.size", "104857600")
    

Note

If the error still occurs, increase the multipart size even more.