Apache Spark job fails with a Connection pool shut down
error
Problem
A Spark job fails with the error message java.lang.IllegalStateException: Connection pool shut down
when attempting to write data into a Delta table on S3.
Cause
Spark jobs writing to S3 are limited to a maximum number of simultaneous connections. The java.lang.IllegalStateException: Connection pool shut down
occurs when this connection pool is exhausted.
Solution
The client connection pool is configured by the fs.s3a.connection.maximum
value. This defines the maximum number of simultaneous connections to S3. It defaults to a value of 200. You can increase the size of the client connection pool by setting a higher value in the Spark configuration properties.
Databricks recommends that you set the maximum number of connections to a multiple of the total number of cores in your cluster. For example, if you are using a 32 core cluster, you should try setting fs.s3a.connection.maximum
to a value of 320 or 352.
Once the maximum number of connections is set high enough, the java.lang.IllegalStateException: Connection pool shut down
will no longer occur.