Problem
When you copy a large file from the local file system to DBFS on S3, the following exception can occur:
Amazon.S3.AmazonS3Exception: Part number must be an integer between 1 and 10000, inclusive
Cause
This is an S3 limit on segment count. Part files can only be numbered from 1 to 10000, inclusive.
Solution
To prevent this exception from occurring, increase the size of each part file. Set the following property at the cluster level or notebook level.
- Cluster Level (Bash): you must restart the cluster after setting this property.
spark.hadoop.fs.s3a.multipart.size 104857600
- Notebook Level (Python):
spark.conf.set("spark.hadoop.fs.s3a.multipart.size", "104857600")