Skip to content

Running R on CREATE

Using modules, R/4.2 is the current default version on CREATE HPC.

Tip

To avoid memory limitations on the login nodes, request an interactive session to complete your installation process. R and its' dependencies can require more memory than what is allocated as default. Please see the documentation on how to request more resources when using CREATE HPC.

How to use R via Modules

You can use the R modules within CREATE to run the R interactively within the terminal and install R packages:

1
2
3
module load r/4.3.0-gcc-13.2.0-withx-rmath-standalone-python3+-chk-version
R
>install.packages("BiocManager")

When installing a package not every software dependency would be loaded with the default R module.

If you are installing a package like the lme4 package. Your installation may run into this missing dependency issue and give a warning like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
------------------ CMAKE NOT FOUND --------------------

CMake was not found on the PATH. Please install CMake:

 - sudo yum install cmake          (Fedora/CentOS; inside a terminal)
 - sudo apt install cmake          (Debian/Ubuntu; inside a terminal).
 - sudo pacman -S cmake            (Arch Linux; inside a terminal).
 - sudo brew install cmake         (MacOS; inside a terminal with Homebrew)
 - sudo port install cmake         (MacOS; inside a terminal with MacPorts)

Alternatively install CMake from: <https://cmake.org/>

-------------------------------------------------------

A lot of software is already installed on CREATE and is available through modules. module avail will help with finding these dependencies.

For the lme4 example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
$ module avail cmake

------------------------------------------------- /software/spackages_v0_21_prod/modules/linux-ubuntu22.04-zen2 ---------------------------------------------
   cmake/3.6.1-gcc-13.2.0    cmake/3.27.7-gcc-11.4.0    cmake/3.27.7-gcc-13.2.0-curl-7.88.1    cmake/3.27.7-gcc-13.2.0 (D)

  Where:
   D:  Default Module

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

$ module load cmake/3.27.7-gcc-13.2.0-curl-7.88.1
$ R
>install.packages("lme4")

If you ever get an error similar to the cmake not found make sure:

  1. Whether the module is available

  2. If it is, check if it is loaded

If you have searched using module avail and you still cannot find the dependency within the modules then raise a software request to support@er.kcl.ac.uk.

Using pkg-config

To install certain R packages on the HPC you will need to set some additional environment variables. You can do this using pkg-config, which is available through module: pkgconf/1.9.5-gcc-13.2.0. For example, the following instructions can be used to install devtools:

1
2
3
4
5
6
7
$ module load r/4.3.0-gcc-13.2.0-withx-rmath-standalone-python3+-chk-version
$ export PKG_CONFIG_ALLOW_SYSTEM_CFLAGS=1
$ export PKG_CONFIG_ALLOW_SYSTEM_LIBS=1
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`pkg-config --variable=libdir libtiff-4`
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:`pkg-config --variable=libdir libjpeg`
$ R
>install.packages("devtools")

Setting a new package-library location

By default, R will set your home directory as the library path for all new package installs. To avoid using too much of the storage space available in your home directory, which may result quota limits being reached, you can create and set a new library location on your own personal scratch directory, as shown here:

1
mkdir -p /scratch/users/k1234567/software/R/4.2

Now using R to install the packages interactively:

1
2
3
4
$ module load r/4.3.0-gcc-13.2.0-withx-rmath-standalone-python3+-chk-version
$ R
>.libPaths("/scratch/users/k1234567/software/R/4.2")
>install.packages("tidyr")

Make sure the new library path is included in your R scripts or interactive sessions:

Example batch script:

1
2
3
4
5
6
7
8
  #!/bin/bash -l
  #SBATCH --job-name=ops-r
  #SBATCH --output=/scratch/users/%u/%j.out
  #SBATCH --partition=cpu

  module load r/4.3.0-gcc-13.2.0-withx-rmath-standalone-python3+-chk-version

  Rscript library-test.R

With an example R script (library-test.R) using the new library location:

1
2
3
4
5
6
#!/usr/bin/env Rscript

.libPaths("/scratch/users/k1234567/software/R/4.2")

library(tidyr)
packageVersion("tidyr")

Using Singularity to set up R

Singularity is a containerisation tool, similar to Docker, which can be used to run R on CREATE HPC. The following command can be used to download the lastest version of R available from Docker Hub:

1
singularity pull docker://r-base

Tags can also be used for downloading more specific versions: docker://r-base:3.6.1. The Rocker Project is also a great source for containerised R environments, enabling both reproducible builds and complete dependency solutions:

1
singularity pull docker://rocker/r-ver:4.3.0

This should create a Singularity image file called: r-ver_4.3.0.sif. The next example downloads R 4.2.2 with all the requirements for tidyverse and more.

1
singularity pull docker://rocker/tidyverse:4.2.2

Interactive R containers

Use the following commands to run a interactive containerised R version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
singularity shell tidyverse_4.2.2.sif
Singularity> R

R version 4.2.2 (2022-10-31) -- "Innocent and Trusting"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(tidyverse)

Running containerised RScripts

Singularity containers can also be used to run your required Rscripts. Example Rscript called test.R:

1
2
3
#!/usr/bin/env Rscript

print("Hello World")

Which can be executed with your container and through sbatch:

1
2
3
4
5
#!/bin/bash -l
#SBATCH --job-name=ops-r-container
#SBATCH --partition=cpu

singularity exec r-ver_4.3.0.sif Rscript test.R

Binding container paths for R

By default, within the context of your containerised R application, unlike other immediate /scratch paths on the HPC, your host home directory (/users/k1234567/R/...) is an accessible mounted path in your R container. Therefore, all packages you install in your container will be written to your home directory - the default library location - so to avoid using too much of your home directory storage, you can specify and bind a different HPC location for your containerised R to install to:

1
2
singularity shell --bind /scratch/users/k1234567/software:/my-libraries r-ver_4.3.0.sif
Singularity> ls /my-libraries

The above binds your host scratch path: /scratch/users/k1234567/software to a new example container path: /my-libraries, where any changes that occur on /my-libraries in your container will correspond to /scratch/users/k1234567/software, however, within the context of your container, only /my-libraries is accessible, and the path /scratch/users/k1234567/software does not exist.

Whilst in the same container shell session as above, you can use .libPaths() to set the alternative library location for your containerised R package installations:

1
2
3
4
Singularity> R
> .libPaths("/my-libraries")
> install.packages("ggplot2")
...

It is also possible to specifiy and bind multiple paths for your container, where the following example container path /my-data will correspond to your HPC host path /scratch/users/k1234567/data in addition to the above /my-libraries example:

1
2
3
4
5
singularity shell \
  --bind /scratch/users/k1234567/data:/my-data,/scratch/users/k1234567/software:/my-libraries \
  r-ver_4.3.0.sif
Singularity> R
>read.csv("/my-data/test.csv")

Such container paths should also be used in your Rscripts.

Setting up R with Anaconda

You can also use the Anaconda package manager to use R.

To set up the conda environment in a non standard conda package cache location, and save space in the user directory, run these commands:

1
2
module load anaconda3/2022.10-gcc-13.2.0
conda create --prefix /scratch/users/k1234567/<r_environment_name> -c conda-forge r-base=4.2.2

Note that if not specified r-base will load the latest version of R, in this case it, is R 4.3.1.

This will load the Anaconda environment with some of the more popular R packages. Make sure to include the r- when installing, updating or removing the R packages e.g. conda install r-tidyverse.

Anaconda does not have every single package in the CRAN repository but r-base and r-essential will load with many of the popular packages that R users need.