Databricks Knowledge Base

Main Navigation

  • Help Center
  • Documentation
  • Knowledge Base
  • Community
  • Training
  • Feedback

R with Apache Spark (AWS)

These articles can help you to use R with Apache Spark.

10 Articles in this category

Contact Us

If you still have questions or prefer to get help directly from an agent, please submit a request. We’ll get back to you as soon as possible.

Please enter the details of your request. A member of our support staff will respond as soon as possible.

  • Home
  • Amazon
  • R with Apache Spark (AWS)

Change version of R (r-base)

These instructions describe how to install a different version of R (r-base) on a cluster. You can check the default r-base version that each Databricks Runtime version is installed with in the System environment section of each Databricks Runtime release note (AWS | Azure | GCP). List available r-base-core versions To list the versions of r-base-co...

Last updated: May 20th, 2022 by Adam Pavlacka

Fix the version of R packages

When you use the install.packages() function to install CRAN packages, you cannot specify the version of the package, because the expectation is that you will install the latest version of the package and it should be compatible with the latest version of its dependencies. If you have an outdated dependency installed, it will be updated as well. Som...

Last updated: May 20th, 2022 by Adam Pavlacka

How to parallelize R code with gapply

Parallelization of R code is difficult, because R code runs on the driver and R data.frames are not distributed. Often, there is existing R code that is run locally and that is converted to run on Apache Spark. In other cases, some SparkR functions used for advanced statistical analysis and machine learning techniques may not support distributed com...

Last updated: May 20th, 2022 by Adam Pavlacka

How to parallelize R code with spark.lapply

Parallelization of R code is difficult, because R code runs on the driver and R data.frames are not distributed. Often, there is existing R code that is run locally and that is converted to run on Apache Spark. In other cases, some SparkR functions used for advanced statistical analysis and machine learning techniques may not support distributed com...

Last updated: May 20th, 2022 by Adam Pavlacka

How to persist and share code in RStudio

Problem Unlike a Databricks notebook that has version control built in, code developed in RStudio is lost when the high concurrency cluster hosting Rstudio is shut down. Solution To persist and share code in RStudio, do one of the following: From RStudio, save the code to a folder on DBFS which is accessible from both Databricks notebooks and RStudi...

Last updated: May 20th, 2022 by Adam Pavlacka

Install rJava and RJDBC libraries

This article explains how to install rJava and RJBDC libraries. Problem When you install rJava and RJDBC libraries with the following command in a notebook cell: %r install.packages(c("rJava", "RJDBC")) You observe the following error: ERROR: configuration failed for package 'rJava' Cause The rJava and RJDBC packages check for Java dependencies and ...

Last updated: May 20th, 2022 by Adam Pavlacka

Rendering an R markdown file containing sparklyr code fails

Problem After you install and configure RStudio in the Databricks environment, when you launch RStudio and click the Knit button to knit a Markdown file that contains code to initialize a sparklyr context, rendering fails with the following error: failed to start sparklyr backend:object 'DATABRICKS_GUID' not found Calls: <Anonymous>… tryCatch ...

Last updated: May 20th, 2022 by Adam Pavlacka

Resolving package or namespace loading error

This article explains how to resolve a package or namespace loading error. Problem When you install and load some libraries in a notebook cell, like: %r library(BreakoutDetection) You may get a package or namespace error: Loading required package: BreakoutDetection: Error : package or namespace load failed for ‘BreakoutDetection’ in loadNamespace(i,...

Last updated: May 20th, 2022 by Adam Pavlacka

RStudio server backend connection error

Problem You get a backend connection error when using RStudio server. Error in Sys.setenv(EXISTING_SPARKR_BACKEND_PORT = system(paste0("wget -qO - 'http://localhost:6061/?type=\"com.databricks.backend.common.rpc.DriverMessages$StartRStudioSparkRBackend\"' --post-data='{\"@class\":\"com.databricks.backend.common.rpc.DriverMessages$StartRStudioSparkRB...

Last updated: May 20th, 2022 by arvind.ravish

Verify R packages installed via init script

When you configure R packages to install via an init script, it is possible for a package install to fail if dependencies are not installed. You can use the R commands in a notebook to check that all of the packages correctly installed. Info This article does require you to provide a list of packages to check against. List installed packages Make a ...

Last updated: May 20th, 2022 by kavya.parag


© Databricks 2022. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Policy | Terms of Use

Definition by Author

0
0