Running GPU jobs¶

Lots of scientific software is starting to make use of Graphical Processing Units (GPUs) for computation instead of traditional CPU cores. This is because GPUs out-perform CPUs for certain mathematical operations. If you wish to schedule your job on a GPU you need to provide the --gres=gpu option in your submissions script. The following example schedules a job on a GPU node then lists the GPU card it was assigned.

#!/bin/bash -l
#SBATCH --output=/scratch/users/%u/%j.out
#SBATCH --job-name=gpu
#SBATCH --gres=gpu
echo "Hello, World! From $HOSTNAME"
nvidia-debugdump -l
sleep 15
echo "Goodbye, World! From $HOSTNAME"

k1234567@erc-hpc-login1:~$ sbatch -p gpu hellogpu.sh
Submitted batch job 12087
k1234567@erc-hpc-login1:~$ cat /scratch/users/k1234567/12087.out
Hello, World! From erc-hpc-comp-032
Found 2 NVIDIA devices
  Device ID:              0
  Device name:            NVIDIA A100-PCIE-40GB
  GPU internal ID:        1324120038830

  Device ID:              1
  Device name:            NVIDIA A100-PCIE-40GB
  GPU internal ID:        1324120040003

Goodbye, World! From erc-hpc-comp-032

Note

The maximum number of gpus that can be requested is now 8 on public_gpu

Note

Your GPU enabled application will mostly likely make use of the NVidia CUDA libraries, to load the CUDA module use module load cuda in your job submission script.

Testing on the GPU¶

Available alongside the shared public gpu queue, the interruptible_gpu partition gives access to all GPUs in CREATE, leading to a larger pool size that serves well as both a mechanism for testing GPU scheduling and making use of unused private resources, as detailed in our scheduler policy. So it is still important to take note that, although faster scheduling may be facilitated, jobs may be cancelled at anytime. Additionally, as the interruptible_gpu can be made up of a broad mix of GPU architectures, it may be useful to provide the following --constraint scheduler option with your job submissions:

k1234567@erc-hpc-login1:~$ srun -p interruptible_gpu --gres gpu --constraint a40 --pty /bin/bash -l