S3 connection reset error

Apache Spark job fails with S3 connection reset error.

Written by arjun.kaimaparambilrajan

Last published at: March 15th, 2022

Problem

Your Apache Spark job fails when attempting an S3 operation.

The error message Caused by: java.net.SocketException: Connection reset appears in the stack trace.

Example stack trace from an S3 read operation:

Caused by: javax.net.ssl.SSLException: Connection reset; Request ID: XXXXX, Extended Request ID: XXXXX, Cloud Provider: AWS, Instance ID: i-XXXXXXXX
at sun.security.ssl.Alert.createSSLException(Alert.java:127)
at sun.security.ssl.TransportContext.fatal(TransportContext.java:324)
...
at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:833)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
    ...
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90)
...
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:467)
at sun.security.ssl.SSLSocketInputRecord.readFully(SSLSocketInputRecord.java:450)
at sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:243)

Cause

The old version of the Hadoop S3 connector does not retry on SocketTimeoutException or SSLException errors. These exceptions can occur when there is a client side timeout or server side timeout, respectively.

Solution

This issue has been resolved in a new version of the Hadoop S3 connector. Databricks Runtime 7.3 LTS and above use the new connector.

  • If you are using Databricks Runtime 7.3 LTS or above, ensure that these settings DO NOT exist in the cluster’s Spark configuration:
    spark.hadoop.fs.s3.impl com.databricks.s3a.S3AFileSystem
    spark.hadoop.fs.s3n.impl com.databricks.s3a.S3AFileSystem
    spark.hadoop.fs.s3a.impl com.databricks.s3a.S3AFileSystem
  • If you are using Databricks Runtime 7.0 - 7.2, upgrade to Databricks Runtime 7.3 LTS or above.
  • If you are using Databricks Runtime 6.4 or below, contact support for assistance.