R
Introduction
R is free programming language for statistical computing and graphics. R code can be executed in the Integrated Development Environment (IDE) RStudio, from the command line interface (CLI), or by running scripts.
Availability
Multiple installations of R are available on all our systems. You can use them by loading an R module: module load R
will load the latest version or you can use module load R/<version>
to select a specific version.
We describe R bundles and base R modules that can be used to get work done.
R bundles
For most R jobs of significant size, one requires many packages to customize your environment to do computations.
R bundles
address this issue by providing not only a base R implementation but also many
(hundreds if not thousands of) R packages.
This alleviates the need to install a lot of packages yourself (see below for directions on how
to do the latter).
You can find several versions of R-plus-extensions bundles that include a big set of packages and libraries:
R-bundle-CRAN/2023.12-foss-2023a
R-bundle-CRAN/2024.11-foss-2024a
R-bundle-Bioconductor/3.18-foss-2023a-R-4.3.2
R-bundle-Bioconductor/3.20-foss-2024a-R-4.4.2
To load a bundle from the command line, you would use modules, e.g.:
module load R-bundle-CRAN/2024.11-foss-2024a
Base R
Several minimal versions of the base R installation are available but these provide very few add-on packages:
R/4.3.2-gfbf-2023a
R/4.4.2-gfbf-2024a
To load a base R version from the command line, you would use modules, e.g.:
module load R/4.4.2-gfbf-2024a
Execution modes
Just like any other software, R code can be executed in two modes:
Interactive mode
Batch mode
Interactive mode
Running interactively (e.g. in RStudio or the CLI) can be great for code development with small examples. (Larger computations should be submitted as batch jobs, via a tradition job submission script.)
To run R from the command line, we need to load the software. In an interactive job on TinkerCliffs, this would look like so:
[user@tinkercliffs2 ~]$ interact -A <your slurm account> -p normal_q
srun: job 2920622 queued and waiting for resources
srun: job 2920622 has been allocated resources
[user@tc008 ~]$ module load R-bundle-CRAN/2024.11-foss-2024a
[user@tc008 ~]$ R
...
>
Batch mode
Alternatively, you run R code from the command line in batch mode. This would generally require 2 scripts:
An R script with the actual R code we are needing to run
A shell script for submission to the job schedulers
An example R script called hello.R
might print a message containing the host name to standard output:
print("Rello, from", Sys.info()["nodename"])
And an example bash shell script called hello_R.sh
might request just a single core on a single node:
#!/bin/bash
#SBATCH -A <youraccount>
#SBATCH -p normal_q
module load R-bundle-CRAN/2024.11-foss-2024a
Rscript hello.R
With these files, we can submit the job from the login node:
[user@tinkercliffs2 ~]$ sbatch hello_R.sh
Submitted batch job 2920595
Once your job is finished, anything that was printed to standard output will be in a file called slurm-2920595.out.
R through Open OnDemand (OOD)
OOD is normally used in interative mode, meaning that OOD is most commonly used as an alternative to using a terminal window to enter commands at the command prompt when you are on a compute node.
RStudio is also available on Open OnDemand and this provides an interactive, graphical interface for R while running on cluster compute nodes with access to dedicated CPU and memory resources.
Installing packages
First, you might want to try loading an R bundle module (see above) and determine whether the bundle has all of the R packages that you require, because if it does, then you do not have to install packages yourself.
At this point, we assume that you want to load packages yourself. The most straight-forward approach is to use interactive mode for this process.
You want to issue the three commands as specified in the example above, namely:
interact
module load
R
At this point, you are on a compute node and you are in the R interpreter.
By default, the available directories are set based on the location of the R installation, and the value of the environment variable R_LIBS_USER
which should be the path to where packages are installed.
This value is created and set for you upon launching RStudio from OOD.
It is also set for you as part of your environment setup when accessing a cluster via the command line (e.g., using ssh
).
This is to help you organize your environment and ensure that packages are loaded with the correct version of R you are using. For example, if you are on tinkercliffs, then:
> .libPaths()
[1] "/home/user/R/tinkercliffs-rome/4.4.2"
[2] "/apps/arch/software/R/4.4.2-gfbf-2024a/lib64/R/library"
If your library paths look similar, packages will be installed under your home directory. To install packages, do:
>install.packages("package of interest")
You might want to try one package at a time, at first, because you will have to specify mirror sites for download and not all sites have all packages (you will be prompted to select a mirror site after issuing the command immediately above). For example, recently, a mirror site in Tennessee did not have a particular package but one in Ohio did; you just have to try them.
Sometimes, you may notice issues installing common packages (e.g. ggplot2) because CRAN won’t find versions of some dependencies that are compatible with the specific R version. The solution is to install versions of these dependencies from archived source files. For example, two common dependencies are MATRIX and MASS:
packageurl1 <- 'https://cran.r-project.org/src/contrib/Archive/Matrix/Matrix_1.6-5.tar.gz'
install.packages(packageurl1, repos=NULL, type="source")
packageurl2 <- 'https://cran.r-project.org/src/contrib/Archive/MASS/MASS_7.3-59.tar.gz'
install.packages(packageurl2, repos=NULL, type="source")
Also, after you build up experience with a particular mirror site, you might want to download and install many packages with one command, like so for three packages (you are still inside the R interpreter):
> install.packages( c("package1", "package2", "package3") )
Additional details
More generally, to see all of your installed R packages for all clusters and compute nodes types,
go to the R
directory under your $HOME
directory.
[username@owl2 R]$ pwd
/home/username/R
[username@owl2 R]$ ls
falcon-L40s owl-genoa owl-milan tinkercliffs-rome
This user has R packages installed for falcon, owl, and tinkercliffs clusters. For example, on the falcon cluster, the user has installed R packages to use with L40S GPU-accelerator compute nodes. Inside of each directory are the various versions of R that you have installed packages for, and then under each version are the packages themselves. This is the directory structure used for packages with R.