Problem
Your Apache Spark job fails when attempting an S3 operation.
The error message Caused by: java.net.SocketException: Connection reset appears in the stack trace.
Example stack trace from an S3 read operation:
Caused by: javax.net.ssl.SSLException: Connection reset; Request ID: XXXXX, Extended Request ID: XXXXX, Cloud Provider: AWS, Instance ID: i-XXXXXXXX at sun.security.ssl.Alert.createSSLException(Alert.java:127) at sun.security.ssl.TransportContext.fatal(TransportContext.java:324) ... at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:833) at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ... at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90) at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:90) ... Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:210) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:467) at sun.security.ssl.SSLSocketInputRecord.readFully(SSLSocketInputRecord.java:450) at sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:243)
Cause
The old version of the Hadoop S3 connector does not retry on SocketTimeoutException or SSLException errors. These exceptions can occur when there is a client side timeout or server side timeout, respectively.
Solution
This issue has been resolved in a new version of the Hadoop S3 connector. Databricks Runtime 7.3 LTS and above use the new connector.
- If you are using Databricks Runtime 7.3 LTS or above, ensure that these settings DO NOT exist in the cluster’s Spark configuration:
spark.hadoop.fs.s3.impl com.databricks.s3a.S3AFileSystem spark.hadoop.fs.s3n.impl com.databricks.s3a.S3AFileSystem spark.hadoop.fs.s3a.impl com.databricks.s3a.S3AFileSystem
- If you are using Databricks Runtime 7.0 - 7.2, upgrade to Databricks Runtime 7.3 LTS or above.
- If you are using Databricks Runtime 6.4 or below, contact support for assistance.