Skip to content

Running AlphaFold2 on CREATE

The AlphaFold dataset is publicly available to all users.

This repository was used to setup AlphaFold on CREATE: https://github.com/prehensilecode/alphafold_singularity

Setting up your virtual environment

You will need to first setup your own virtual environment with all of the AlphaFold requirements installed. For a complete guide on how to setup a conda environment on CREATE HPC, then please refer to our guides here.

Configuring AlphaFold

First copy the required source scripts from the provided AlphaFold location on the HPC to a directory stored on your own personal accounts:

1
2
cp -r /datasets/alphafold/alphafold-2.3.2 /scratch/users/k1234567/
cd /scratch/users/k1234567/alphafold-2.3.2

Tip

To avoid memory limitations on the login nodes, request an interactive session to complete your setup. AlphaFold and its' dependencies can require more memory than what is allocated as default. Please see the documentation on how to request more resources when using CREATE HPC.

Next install all of the required dependencies for running AlphaFold through the provided Singularity script:

1
2
3
4
module load anaconda3/2023.09-0-gcc-13.2.0
conda create -n alphafold-env
conda activate alphafold-env
(alphafold-env)$ python3 -m pip install -r singularity/requirements.txt

Although it would be best to request a batch job for the computation, you can use the following to test your test AlphaFold configuration:

1
2
3
4
export ALPHAFOLD_DATADIR=/datasets/alphafold/
export ALPHAFOLD_DIR=/scratch/users/<Your k-number>/alphafold-2.3.2
export output_dir=/scratch/users/<Your k-number>/
python3 ${ALPHAFOLD_DIR}/singularity/run_singularity.py --use_gpu --output_dir=$output_dir --data_dir=${ALPHAFOLD_DATADIR}--fasta_paths=${ALPHAFOLD_DIR}/T1050.fasta --max_template_date=2020-05-14 --model_preset=monomer --db_preset=reduced_dbs

Create a script to be submitted using sbatch

After confirming that you were able to run AlphaFold you can then submit a batch job.

You will need to replace the lines marked with "###HERE" with your user specific details.

The example below will run AlphaFold on the T1050 file.

AlphaFold looks for the fasta file based on where you have specified the path. In the example below note it is:

--fasta_paths=${ALPHAFOLD_DIR}/T1050.fasta

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#!/bin/bash -l
#SBATCH --partition=gpu
#SBATCH --time=1:00:00
#SBATCH --gpus=1
#SBATCH --cpus-per-gpu=16
#SBATCH --mem-per-gpu=32G

eval "$(conda shell.bash hook)"

export ALPHAFOLD_DATADIR=/datasets/alphafold/
export ALPHAFOLD_DIR=/scratch/users/${USER}/alphafold-2.3.2

# >>> singularity cache >>>
# This example script uses an alternative cache location
export SINGULARITY_CACHEDIR=/scratch/users/${USER}/cache/singularity/alphafold
export SINGULARITY_TMPDIR=$SINGULARITY_CACHEDIR/$SLURM_JOB_ID/tmp
export TMPDIR=$SINGULARITY_TMPDIR

mkdir -p $SINGULARITY_CACHEDIR
mkdir -p $SINGULARITY_TMPDIR
# <<< singularity cache <<<

output_dir=/scratch/users/${USER}/output-$SLURM_JOB_ID
mkdir -p $output_dir

echo INFO: SLURM_GPUS_ON_NODE=$SLURM_GPUS_ON_NODE
echo INFO: SLURM_JOB_GPUS=$SLURM_JOB_GPUS
echo INFO: SLURM_STEP_GPUS=$SLURM_STEP_GPUS
echo INFO: ALPHAFOLD_DIR=$ALPHAFOLD_DIR
echo INFO: ALPHAFOLD_DATADIR=$ALPHAFOLD_DATADIR
echo INFO: TMP=$TMPDIR
echo INFO: output_dir=$output_dir

conda activate alphafold-env

python3 ${ALPHAFOLD_DIR}/singularity/run_singularity.py \
    --use_gpu \
    --output_dir=$output_dir \
    --data_dir=${ALPHAFOLD_DATADIR} \
    --fasta_paths=${ALPHAFOLD_DIR}/T1050.fasta \
    --max_template_date=2020-05-14 \
    --model_preset=monomer \
    --db_preset=reduced_dbs

echo INFO: AlphaFold returned $?

Submit the script

1
k1234567@erc-hpc-login1:/scratch/users/k1234567 sbatch ops-alphafold.sh