You observe a job failure with the exception:
com.amazonaws.SdkClientException: Unable to complete multi-part upload. Individual part upload failed : Unable to execute HTTP request: Timeout waiting for connection from pool org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool ... com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1190)
This error originates in the Amazon SDK internal implementation of multi-part upload, which takes all of the multi-part upload requests and submits them as Futures to a thread pool.
There is no back pressure control here. All of the pieces are submitted in parallel. Thus the only limit on the actual parallelism of execution is the size of the thread pool itself. In this case, the thread pool is a BlockingThreadPoolExecutorService a class internal to S3A that queues requests rather than rejecting them once the pool has reached its maximum thread capacity.
There are two parallelism limits here:
- The size of the thread pool used by S3A
- The size of the HTTPClient connection pool inside AmazonS3Client
If the S3A thread pool is smaller than the HTTPClient connection pool, then we could imagine a situation where threads become starved when trying to get a connection from the pool. We could see this happening if hundreds of running commands end up thrashing.
You can tune the sizes of the S3A thread pool and HTTPClient connection pool. One plausible approach would be to reduce the size of the S3A thread pool to be smaller than the HTTPClient pool size. However, this isn’t without risk: in HADOOP-13826 it was reported that sizing the pool too small can cause deadlocks during multi-part upload. There’s a related bug referencing that one on the AWS Java SDK itself: issues/939. Given this, we don’t recommend reducing this pool size. Instead, we recommend that you increase the HTTPClient pool size to match the number of threads in the S3A pool (it is 256 currently). The HTTPClient connection pool is ultimately configured by fs.s3a.connection.maximum which is now hardcoded to 200.
To solve the problem, set the following Spark configuration properties. The properties will be applied to all jobs running in the cluster:
spark.hadoop.fs.s3a.multipart.threshold 2097152000 spark.hadoop.fs.s3a.multipart.size 104857600 spark.hadoop.fs.s3a.connection.maximum 500 spark.hadoop.fs.s3a.connection.timeout 600000