PyTorch
PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
Installation
Tip
To avoid memory limitations on the login nodes, request an interactive session to complete your installation process.
PyTorch and its' dependencies can require more memory than what is allocated as default.
Please see the documentation on how to request more resources when using CREATE HPC.
Using conda environments
Conda is an excellent framework for running repeatable analysis, and to avoid taking up storage space in the default conda location of your home directory the following examples assume you have created a non-standard conda package cache location,
however, this is not a requirement and the standard method will work just as well.
To install PyTorch in a conda environment:
| $ module load anaconda3/2023.09-0-gcc-13.2.0
$ cd /scratch/users/k1234567/conda
$ conda create --prefix ./torch-env -c pytorch -c conda-forge python=3.10.12 \
pytorch=2.0.1 cudatoolkit=11.7 torchvision torchaudio
$ conda activate /scratch/users/k1234567/conda/torch-env
|
Using virtualenv
Using anaconda environments is the recommended way to install PyTorch. Though you can still use pip
and virtualenv
to install and run it. First make sure you are on the GPU node, then run:
| $ module load cudnn/8.7.0.84-11.8-gcc-13.2.0
$ module load cuda/11.8.0-gcc-13.2.0
$ virtualenv pytorch-venv
$ pip install torch torchvision numpy
$ python3
>>> import torch
>>> print("Number of GPUs active: ", torch.cuda.device_count())
|
Installing from source
PyTorch can be installed from source alongside packages from the recommended Anaconda distribution, this option will allow you to make your own changes and even install the absolute latest version, which you can find more information about from the PyTorch source website.
It is also possible to specify a particular preferred release and complete your installation from there:
1
2
3
4
5
6
7
8
9
10
11
12
13
14 | $ module load anaconda3/2023.09-0-gcc-13.2.0
$ cd /scratch/users/k1234567/conda
$ conda create --prefix ./torch-src-install-env -c conda-forge -c pytorch \
python=3.10.12 astunparse numpy ninja pyyaml setuptools cmake cffi \
typing_extensions future six requests dataclasses nccl mkl mkl-include \
magma-cuda117 cudatoolkit=11.7
$ conda activate /scratch/users/k1234567/conda/torch-src-install-env
$ module load cuda/11.8.0-gcc-13.2.0
$ module load cudnn/8.7.0.84-11.8-gcc-13.2.0
$ cd /your/project/software/location/
$ git clone https://github.com/pytorch/pytorch.git
$ cd pytorch
$ git checkout tags/v2.0.1
$ python setup.py install
|
Please note, if you have no requirement for the absolute latest version or do not plan on interacting with the PyTorch source code, then installing PyTorch from a package manager may be the best and most efficient solution for your project requirements.
Using Singularity
You can also use singularity to run a docker container of a GPU enabled PyTorch. Your only prerequisite is to be on the GPU node.
| singularity pull docker://pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
|
To test whether you have access to the GPU through the PyTorch container run:
| $ singularity shell --nv pytorch_2.0.1-cuda11.7-cudnn8-runtime.sif
Singularity> python
>>> import torch
>>> print(torch.cuda.get_device_name())
|
PyTorch and launching a Jupyter notebook for GPU usage
For a complete guide on how to launch Jupyter on CREATE HPC, please refer to our guide document here.
The process requires a python environment with the jupyterlab
package installed. You can simply combine it with any of the installations above, by installing jupyterlab
into the torch-env
environment:
Once the environment with Jupyter lab and PyTorch is set up, you can start the server with the following script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36 | #!/bin/bash -l
#SBATCH --job-name=torch-jupyter
#SBATCH --partition=gpu
#SBATCH --gres=gpu
#SBATCH --signal=USR2
#SBATCH --time=02:00:00
module load anaconda3/2023.09-0-gcc-13.2.0
# get unused socket per https://unix.stackexchange.com/a/132524
readonly DETAIL=$(python -c 'import datetime; print(datetime.datetime.now())')
readonly IPADDRESS=$(hostname -I | tr ' ' '\n' | grep '10.211.4.')
readonly PORT=$(python -c 'import socket; s=socket.socket(); s.bind(("", 0)); print(s.getsockname()[1]); s.close()')
cat 1>&2 <<END
1. SSH tunnel from your workstation using the following command:
ssh -NL 8888:${HOSTNAME}:${PORT} ${USER}@hpc.create.kcl.ac.uk
and point your web browser to http://localhost:8888/lab?token=<add the token from the jupyter output below>
Time started: ${DETAIL}
When done using the notebook, terminate the job by
issuing the following command on the login node:
scancel -f ${SLURM_JOB_ID}
END
source /users/${USER}/.bashrc
source activate /scratch/users/k1234567/conda/torch-env
# source torch-venv/bin/activate
jupyter-lab --port=${PORT} --ip=${IPADDRESS} --no-browser
printf 'notebook exited' 1>&2
|
Testing GPU access with PyTorch in Jupyter
1
2
3
4
5
6
7
8
9
10
11
12
13 | In [1]: import torch
In [2]: torch.cuda.current_device()
Out[2]: 0
In [3]: torch.cuda.device(0)
Out[3]: <torch.cuda.device object at 0x7fbec69d23a0>
In [4]: torch.cuda.device_count()
Out[4]: 1
In [5]: torch.cuda.get_device_name(0)
Out[5]: 'NVIDIA A100-SXM4-40GB'
|