Scheduler policy¶

The CREATE HPC cluster is a shared system available to all staff and students at King's College London, along with staff belonging to, or funded by, the NIHR Maudsley Biomedical Research Centre (BRC).

The cpu and gpu partitions are open to all CREATE HPC users. Jobs will be allocated to nodes by the Slurm scheduler based on resources requested vs resources available.

Limits are set at:

48 hour max job run time on cpu and gpu partitions
Max 700 concurrent cores per user (~10% of CREATE total capacity)
Max 8 concurrent A100 cards per user (~15% of CREATE total capacity)

If the time limit prohibits you being able to run a job, please check to see if it is possible to checkpoint your work. For example TensorFlow, PyTorch, LAMMPS, NAMD and Gromacs all have native checkpointing implementations. If it is not possible to checkpoint a specific job (and we know it is not possible to checkpoint all applications) please see Long Partition Access for details on how to request access to partitions with longer maximum run times.

Interruptible partitions¶

These are partitions that make use of the unused capacity of nodes in private partitions. Jobs will run with a lower priority and thus may get cancelled if a job from the private partition's owners requires those resources.

Private partitions¶

Where funding (e.g. research grant, departmental budget) is available one or more private nodes can be purchased. See the server price list for cost estimates. These nodes will then be placed in a private partition limited to your group/department and will not require limits (e.g. max run time) to be set. To explore this further you can get in touch via a support ticket.