Skip to content

PyTorch

PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.

Installation

Tip

To avoid memory limitations on the login nodes, request an interactive session to complete your installation process. PyTorch and its' dependencies can require more memory than what is allocated as default. Please see the documentation on how to request more resources when using CREATE HPC.

Using conda environments

Conda is an excellent framework for running repeatable analysis, and to avoid taking up storage space in the default conda location of your home directory the following examples assume you have created a non-standard conda package cache location, however, this is not a requirement and the standard method will work just as well.

To install PyTorch in a conda environment:

1
2
3
4
5
$ module load anaconda3/2022.10-gcc-10.3.0
$ cd /scratch/users/k1234567/conda
$ conda create --prefix ./torch-env -c pytorch -c conda-forge python=3.9.12 \
  pytorch=2.0.1 cudatoolkit=11.7 torchvision torchaudio
$ conda activate /scratch/users/k1234567/conda/torch-env

Using virtualenv

Using anaconda environments is the recommended way to install PyTorch. Though you can still use pip and virtualenv to install and run it. First make sure you are on the GPU node, then run:

1
2
3
4
5
6
7
$ module load cudnn/8.2.4.15-11.4-gcc-10.3.0
$ module load cuda/11.4.4-gcc-10.3.0
$ virtualenv pytorch-venv
$ pip install torch torchvision
$ python
>>> import torch
>>> print("Number of GPUs active: ", torch.cuda.device_count())

Installing from source

PyTorch can be installed from source alongside packages from the recommended Anaconda distribution, this option will allow you to make your own changes and even install the absolute latest version, which you can find more information about from the PyTorch source website. It is also possible to specify a particular preferred release and complete your installation from there:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
$ module load anaconda3/2022.10-gcc-10.3.0
$ cd /scratch/users/k1234567/conda
$ conda create --prefix ./torch-src-install-env -c conda-forge -c pytorch \
  python=3.9.12 astunparse numpy ninja pyyaml setuptools cmake cffi \
  typing_extensions future six requests dataclasses nccl mkl mkl-include \
  magma-cuda117 cudatoolkit=11.7
$ conda activate /scratch/users/k1234567/conda/torch-src-install-env
$ module load cuda/11.4.4-gcc-10.3.0
$ module load cudnn/8.2.4.15-11.4-gcc-10.3.0
$ cd /your/project/software/location/
$ git clone https://github.com/pytorch/pytorch.git
$ cd pytorch
$ git checkout tags/v2.0.1
$ python setup.py install

Please note, if you have no requirement for the absolute latest version or do not plan on interacting with the PyTorch source code, then installing PyTorch from a package manager may be the best and most efficient solution for your project requirements.

Using Singularity

You can also use singularity to run a docker container of a GPU enabled PyTorch. Your only prerequisite is to be on the GPU node.

1
singularity pull docker://pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime

To test whether you have access to the GPU through the PyTorch container run:

1
2
3
4
$ singularity shell --nv pytorch_2.0.1-cuda11.7-cudnn8-runtime.sif
Singularity> python
>>> import torch
>>> print(torch.cuda.get_device_name())

PyTorch and launching a Jupyter notebook for GPU usage

For a complete guide on how to launch Jupyter on CREATE HPC, then please refer to our guide document here. To create a conda environment for installing Jupyter lab:

1
2
3
4
5
cd /scratch/users/k1234567/conda
conda create --prefix ./jenv python=3.9.12
conda activate /scratch/users/k1234567/conda/jenv
conda install -c conda-forge jupyterlab
conda deactivate

The version of Python has been specified so that it works with the module version of py-torch on CREATE, this avoids having to build the PyTorch software in the same conda environment.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/bin/bash -l

#SBATCH --job-name=torch-jupyter
#SBATCH --partition=gpu
#SBATCH --gres=gpu
#SBATCH --signal=USR2
#SBATCH --time=02:00:00

module load anaconda3/2022.10-gcc-10.3.0
module load py-torch/1.10.0-gcc-10.3.0-openmpi-4.1.3-python3+-chk-version

# get unused socket per https://unix.stackexchange.com/a/132524
readonly DETAIL=$(python -c 'import datetime; print(datetime.datetime.now())')
readonly IPADDRESS=$(hostname -I | tr ' ' '\n' | grep '10.211.4.')
readonly PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
cat 1>&2 <<END
1. SSH tunnel from your workstation using the following command:

   ssh -NL 8888:${HOSTNAME}:${PORT} ${USER}@hpc.create.kcl.ac.uk

   and point your web browser to http://localhost:8888/lab?token=<add the token from the jupyter output below>

Time started: ${DETAIL}

When done using the notebook, terminate the job by
issuing the following command on the login node:

      scancel -f ${SLURM_JOB_ID}

END

source /users/${USER}/.bashrc
source activate /scratch/users/k1234567/conda/jenv
# source jvenv/bin/activate
jupyter-lab --port=${PORT} --ip=${IPADDRESS} --no-browser

printf 'notebook exited' 1>&2

Testing GPU access with PyTorch in Jupyter

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
In [1]: import torch

In [2]: torch.cuda.current_device()
Out[2]: 0

In [3]: torch.cuda.device(0)
Out[3]: <torch.cuda.device object at 0x7fbec69d23a0>

In [4]: torch.cuda.device_count()
Out[4]: 1

In [5]: torch.cuda.get_device_name(0)
Out[5]: 'NVIDIA A100-SXM4-40GB'