You are trying to access a table on a remote HDFS location or an object store that you do not have permission to access. The SELECT command should fail, and it does, but it does not fail quickly. It can take up to ten minutes, sometimes more, to return a ConnectTimeoutException error message.
The error message they eventually receive is : " Error in SQL statement: ConnectTimeoutException: Call From 1006-163012-faded894-10-133-241-86/127.0.1.1 to analytics.aws.healthverity.com:8020 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=analytics.aws.healthverity.com/10.24.12.199:8020]; For more details see: SocketTimeout - HADOOP2 - Apache Software Foundation "
Everything is working as designed, however the default Apache Hadoop values for connection timeout and retry are high, which is why the connection does not fail quickly.
ipc.client.connect.timeout 20000 ipc.client.connect.max.retries.on.timeouts 45
Review the complete list of Hadoop common core-default.xml values.
Review the SocketTimeout documentation for more details.
You can resolve the issue by reducing the values for connection timeout and retry.
- The ipc.client.connect.timeout value is in seconds.
- The ipc.client.connect.max.retries.on.timeouts value is the number of times to retry before failing.
Set these values in your cluster's Spark config (AWS | Azure).
If you are not sure what values to use, these are Databricks recommended values:
ipc.client.connect.timeout 5000 ipc.client.connect.max.retries.on.timeouts 3