Problem
You have a job that is reading and writing to an SQL endpoint over a JDBC connection.
The SQL warehouse fails to execute the job and you get a java.net.SocketTimeoutException: Read timed out error message.
2022/02/04 17:36:15 - TI_stg_trade.0 - Caused by: com.simba.spark.jdbc42.internal.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out 2022/02/04 17:36:15 - TI_stg_trade.0 - at com.simba.spark.hivecommon.api.TETHttpClient.flushUsingHttpClient(Unknown Source) 2022/02/04 17:36:15 - TI_stg_trade.0 - at com.simba.spark.hivecommon.api.TETHttpClient.flush(Unknown Source) 2022/02/04 17:36:15 - TI_stg_trade.0 - at com.simba.spark.jdbc42.internal.apache.thrift.TServiceClient.sendBase(TServiceClient.java:73) 2022/02/04 17:36:15 - TI_stg_trade.0 - at com.simba.spark.jdbc42.internal.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62)
Cause
Each incoming request requires a thread for the duration of the request. When the number of simultaneous requests is greater than the number of available threads, a timeout can occur. This can occur during long running queries.
Solution
Increase the SocketTimeout value in the JDBC connection URL.
In this example, the SocketTimeout is set to 300 seconds:
jdbc:spark://<server-hostname>:443;HttpPath=<http-path>;TransportMode=http;SSL=1[;property=value[;property=value]];SocketTimeout=300
For more information, review the Building the connection URL for the legacy Spark driver (AWS | Azure | GCP) documentation.