Problem
You get a backend connection error when using RStudio server.
Error in Sys.setenv(EXISTING_SPARKR_BACKEND_PORT = system(paste0("wget -qO - 'http://localhost:6061/?type=\"com.databricks.backend.common.rpc.DriverMessages$StartRStudioSparkRBackend\"' --post-data='{\"@class\":\"com.databricks.backend.common.rpc.DriverMessages$StartRStudioSparkRBackend\", \"guid\": \"", : wrong length for argument
If you view the cluster driver and worker logs (AWS | Azure | GCP), you see a message about exceeding the maximum number of RBackends.
21/08/09 15:02:26 INFO RDriverLocal: 312. RDriverLocal.3f6d80d6-70c4-4101-b50f-2530df112ea2: Exceeded maximum number of RBackends limit: 200 21/08/09 15:03:55 INFO RDriverLocal: 313. RDriverLocal.3f6d80d6-70c4-4101-b50f-2530df112ea2: Exceeded maximum number of RBackends limit: 200 21/08/09 15:04:06 INFO RDriverLocal: 314. RDriverLocal.3f6d80d6-70c4-4101-b50f-2530df112ea2: Exceeded maximum number of RBackends limit: 200 21/08/09 15:13:42 INFO RDriverLocal: 315. RDriverLocal.3f6d80d6-70c4-4101-b50f-2530df112ea2: Exceeded maximum number of RBackends limit: 200
Cause
Databricks clusters are configured for 200 RBackends by default.
If you exceed this limit, you get an error.
Solution
You can use an init script to increase the soft limit of RBackends available for use.
This sample code creates an init script that sets a limit of 400 RBackends on the cluster.
%scala val initScriptContent = s""" |#!/bin/bash |cat > /databricks/common/conf/rbackend_limit.conf << EOL |{ | databricks.daemon.driver.maxNumRBackendsPerDriver = 400 |} |EOL """.stripMargin dbutils.fs.put("dbfs:/databricks/<init-script-folder>/set_rbackend.sh",initScriptContent, true)
Install the newly created init script as a cluster-scoped init script (AWS | Azure | GCP).
You will need the full path to the location of the script (dbfs:/databricks/<init-script-folder>/set_rbackend.sh).
Restart the cluster after you have installed the init script.
Validate solution
You can confirm that the changes were successful by running this sample code in a notebook.
%r library(magrittr) SparkR:::callJStatic( "com.databricks.backend.daemon.driver.RDriverLocal", "getDriver", get(DB_GUID_, envir = .GlobalEnv)) %>% SparkR:::callJMethod("conf") %>% SparkR:::callJMethod("maxNumRBackendsPerDriver")
When run, this code returns the current RBackends limit on the cluster.
Best practices
Ensure that you log out of RStudio when you are finished using it. This terminates the R session and cleans the RBackend.
If the RStudio server is killed, or the RSession terminates unexpectedly, the cleanup step may not happen.
Databricks Runtime 9.0 and above automatically cleans up idle RBackend sessions.