RStudio server backend connection error

Problem

You get a backend connection error when using RStudio server.

Error in Sys.setenv(EXISTING_SPARKR_BACKEND_PORT = system(paste0("wget -qO - 'http://localhost:6061/?type=\"com.databricks.backend.common.rpc.DriverMessages$StartRStudioSparkRBackend\"' --post-data='{\"@class\":\"com.databricks.backend.common.rpc.DriverMessages$StartRStudioSparkRBackend\", \"guid\": \"", :
wrong length for argument

If you view the cluster driver and worker logs, you see a message about exceeding the maximum number of RBackends.

21/08/09 15:02:26 INFO RDriverLocal: 312. RDriverLocal.3f6d80d6-70c4-4101-b50f-2530df112ea2: Exceeded maximum number of RBackends limit: 200
21/08/09 15:03:55 INFO RDriverLocal: 313. RDriverLocal.3f6d80d6-70c4-4101-b50f-2530df112ea2: Exceeded maximum number of RBackends limit: 200
21/08/09 15:04:06 INFO RDriverLocal: 314. RDriverLocal.3f6d80d6-70c4-4101-b50f-2530df112ea2: Exceeded maximum number of RBackends limit: 200
21/08/09 15:13:42 INFO RDriverLocal: 315. RDriverLocal.3f6d80d6-70c4-4101-b50f-2530df112ea2: Exceeded maximum number of RBackends limit: 200

Cause

Databricks clusters are configured for 200 RBackends by default.

If you exceed this limit, you get an error.

Solution

You can use an init script to increase the soft limit of RBackends available for use.

This sample code creates an init script that sets a limit of 400 RBackends on the cluster.

%scala
val initScriptContent = s"""
 |#!/bin/bash
 |cat > /databricks/common/conf/rbackend_limit.conf << EOL
 |{
 | databricks.daemon.driver.maxNumRBackendsPerDriver = 400
 |}
 |EOL
""".stripMargin

dbutils.fs.put("dbfs:/databricks/<init-script-folder>/set_rbackend.sh",initScriptContent, true)

Note

The sample code sets the RBackends limit to 400. You can adjust this number as needed. You should not exceed 500 RBackends.

Install the newly created init script as a cluster-scoped init script.

You will need the full path to the location of the script (dbfs:/databricks/<init-script-folder>/set_rbackend.sh).

Restart the cluster after you have installed the init script.

Validate solution

You can confirm that the changes were successful by running this sample code in a notebook.

library(magrittr)
SparkR:::callJStatic(
  "com.databricks.backend.daemon.driver.RDriverLocal",
  "getDriver",
  get(DB_GUID_, envir = .GlobalEnv)) %>% SparkR:::callJMethod("conf") %>% SparkR:::callJMethod("maxNumRBackendsPerDriver")

When run, this code returns the current RBackends limit on the cluster.

Best practices

Ensure that you log out of RStudio when you are finished using it. This terminates the R session and cleans the RBackend.

If the RStudio server is killed, or the RSession terminates unexpectedly, the cleanup step may not happen.

Databricks Runtime 9.0 and above automatically cleans up idle RBackend sessions.