Apache Spark job fails with a Connection pool shut down error

Problem

A Spark job fails with the error message java.lang.IllegalStateException: Connection pool shut down when attempting to write data into a Delta table on S3.

Cause

Spark jobs writing to S3 are limited to a maximum number of simultaneous connections. The java.lang.IllegalStateException: Connection pool shut down occurs when this connection pool is exhausted.

Solution

The client connection pool is configured by the fs.s3a.connection.maximum value. This defines the maximum number of simultaneous connections to S3. It defaults to a value of 200. You can increase the size of the client connection pool by setting a higher value in the Spark configuration properties.

Databricks recommends that you set the maximum number of connections to a multiple of the total number of cores in your cluster. For example, if you are using a 32 core cluster, you should try setting fs.s3a.connection.maximum to a value of 320 or 352.

Once the maximum number of connections is set high enough, the java.lang.IllegalStateException: Connection pool shut down will no longer occur.