R commands fail on custom Docker cluster

R version 4.2.0 changed the way Renviron.site is initialized, so you must set an environment variable when using custom Docker clusters.

Written by Atanu.Sarkar

Last published at: January 20th, 2023

Problem

You are trying to run R notebooks on a custom Docker cluster (AWS | Azure), but they immediately fail.

When you try to execute an R notebook, it returns an error saying the notebook was cancelled.

When you review the Cluster driver and worker logs (AWS | Azure) you see a there is no package called 'Rserve' error.

Tue Aug 30 16:24:34 UTC 2022 Starting R processing from BASH 
  
Tue Aug 30 16:24:34 UTC 2022 R script: /local_disk0/tmp/_rServeScript.r6851825576782071270resource.r 
  
Tue Aug 30 16:24:34 UTC 2022 Port number: 1108 
  
Tue Aug 30 16:24:34 UTC 2022 cgroup: None 
  
2022-08-30 16:24:34 R process started with pid 1462 
  
Error in loadNamespace(x) : there is no package called 'Rserve' 
  
Calls: loadNamespace -> withRestarts -> withOneRestart -> doWithOneRestart 
  
Execution halted. 


When you check for the Python libraries, they are all present. 

When you check the R version in a notebook, it returns the version information so you know R is installed.

%sh

R --version
R version 4.2.0 (2022-04-22) -- "Vigorous Calisthenics" 
Copyright (C) 2022 The R Foundation for Statistical Computing 
Platform: x86_64-pc-linux-gnu (64-bit) 
  
R is free software and comes with ABSOLUTELY NO WARRANTY. 
You are welcome to redistribute it under the terms of the 
GNU General Public License versions 2 or 3. 
For more information about these matters see 
https://www.gnu.org/licenses/.

Cause

Databricks Runtimes use R version 4.1.3 by default. If you start a standard cluster from the Compute menu in the workspace and check the version, it returns R version 4.1.3.

When you build a custom cluster with Docker, it is possible to use a different R version. In the example used here, we see that the custom Docker cluster is running R version 4.2.0.

R version 4.2.0 changed the way Renviron.site is initialized, which implicitly modifies the behavior of --vanilla.

Solution

If you want to use R version 4.2.0 on a custom Docker cluster with Databricks Runtime 11.3 and below, you must set the DATABRICKS_ENABLE_RPROFILE=true environment variable (AWS | Azure) on the cluster.

If you want to use R version 4.2.0 on a custom Docker cluster with Databricks Runtime 12.0 and above, you can use R session customization (AWS | Azure) to set DATABRICKS_ENABLE_RPROFILE=true in the .Rprofile file.

For more information on installing R, please review the Install RStudio Server Open Source Edition (AWS | Azure) documentation.