You are trying to run
MSCK REPAIR TABLE <table-name> commands for the same table in parallel and are getting
java.net.SocketTimeoutException: Read timed out or out of memory error messages.
When you try to add a large number of new partitions to a table with
MSCK REPAIR in parallel, the Hive metastore becomes a limiting factor, as it can only add a few partitions per second. The greater the number of new partitions, the more likely that a query will fail with a
java.net.SocketTimeoutException: Read timed out error or an out of memory error message.
You should not attempt to run multiple
MSCK REPAIR TABLE <table-name> commands in parallel.
Databricks uses multiple threads for a single
MSCK REPAIR by default, which splits
createPartitions() into batches. By limiting the number of partitions created, it prevents the Hive metastore from timing out or hitting an out of memory error. It also gathers the fast stats (number of files and the total size of files) in parallel, which avoids the bottleneck of listing the metastore files sequentially. This is controlled by
spark.sql.gatherFastStats, which is enabled by default.