Change version of R (r-base)
These instructions describe how to install a different version of R (r-base) on a cluster. You can check the default r-base version that each Databricks Runtime version is installed with in the System environment section of each Databricks Runtime release note (AWS | Azure | GCP). List available r-base-core versions To list the versions of r-base-co...
Fix the version of R packages
When you use the install.packages() function to install CRAN packages, you cannot specify the version of the package, because the expectation is that you will install the latest version of the package and it should be compatible with the latest version of its dependencies. If you have an outdated dependency installed, it will be updated as well. Som...
How to parallelize R code with gapply
Parallelization of R code is difficult, because R code runs on the driver and R data.frames are not distributed. Often, there is existing R code that is run locally and that is converted to run on Apache Spark. In other cases, some SparkR functions used for advanced statistical analysis and machine learning techniques may not support distributed com...
How to parallelize R code with spark.lapply
Parallelization of R code is difficult, because R code runs on the driver and R data.frames are not distributed. Often, there is existing R code that is run locally and that is converted to run on Apache Spark. In other cases, some SparkR functions used for advanced statistical analysis and machine learning techniques may not support distributed com...
How to persist and share code in RStudio
Problem Unlike a Databricks notebook that has version control built in, code developed in RStudio is lost when the high concurrency cluster hosting Rstudio is shut down. Solution To persist and share code in RStudio, do one of the following: From RStudio, save the code to a folder on DBFS which is accessible from both Databricks notebooks and RStudi...
Install rJava and RJDBC libraries
This article explains how to install rJava and RJBDC libraries. Problem When you install rJava and RJDBC libraries with the following command in a notebook cell: %r install.packages(c("rJava", "RJDBC")) You observe the following error: ERROR: configuration failed for package 'rJava' Cause The rJava and RJDBC packages check for Java dependencies and ...
Rendering an R markdown file containing sparklyr code fails
Problem After you install and configure RStudio in the Databricks environment, when you launch RStudio and click the Knit button to knit a Markdown file that contains code to initialize a sparklyr context, rendering fails with the following error: failed to start sparklyr backend:object 'DATABRICKS_GUID' not found Calls: <Anonymous>… tryCatch ...
Resolving package or namespace loading error
This article explains how to resolve a package or namespace loading error. Problem When you install and load some libraries in a notebook cell, like: %r library(BreakoutDetection) You may get a package or namespace error: Loading required package: BreakoutDetection: Error : package or namespace load failed for ‘BreakoutDetection’ in loadNamespace(i,...
RStudio server backend connection error
Problem You get a backend connection error when using RStudio server. Error in Sys.setenv(EXISTING_SPARKR_BACKEND_PORT = system(paste0("wget -qO - 'http://localhost:6061/?type=\"com.databricks.backend.common.rpc.DriverMessages$StartRStudioSparkRBackend\"' --post-data='{\"@class\":\"com.databricks.backend.common.rpc.DriverMessages$StartRStudioSparkRB...
Verify R packages installed via init script
When you configure R packages to install via an init script, it is possible for a package install to fail if dependencies are not installed. You can use the R commands in a notebook to check that all of the packages correctly installed. Info This article does require you to provide a list of packages to check against. List installed packages Make a ...