Skip to content

Running AlphaFold2 on CREATE

In order to run Alphafold from the HPC nodes it is ideal to first configure everything through an interactive job.

After you have confirmed everything has been set up properly it would be best to request a batch job for the computation.

The AlphaFold dataset is publicly available to all users.

This repository was used to setup AlphaFold on CREATE: https://github.com/prehensilecode/alphafold_singularity

You will also need to set up anaconda.

Setting up Anaconda

1
2
3
ml anaconda3
conda init bash
Then log out then log back into create.

Configuring AlphaFold

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
cp -r /datasets/alphafold/alphafold-2.3.2 /scratch/users/<Your k-number>/
cd /scratch/users/<Your k-number>/alphafold-2.3.2
srun -p interruptible_gpu --gres gpu:1 --cpus-per-task 16 --mem 32 --time 04:00:00 --pty /bin/bash -l
conda create -n alphafold
conda activate alphafold
python3 -m pip install -r singularity/requirements.txt
export ALPHAFOLD_DATADIR=/datasets/alphafold/
export ALPHAFOLD_DIR=/scratch/users/<Your k-number>/alphafold-2.3.2
export output_dir=/scratch/users/<Your k-number>/
python3 ${ALPHAFOLD_DIR}/singularity/run_singularity.py     --use_gpu     --output_dir=$output_dir     --data_dir=${ALPHAFOLD_DATADIR}     --fasta_paths=${ALPHAFOLD_DIR}/T1050.fasta    --max_template_date=2020-05-14     --model_preset=monomer     --db_preset=reduced_dbs

Create a script to be submitted using sbatch

After confirming that you were able to run AlphaFold you can then submit a batch job.

You will need to replace the lines marked with "###HERE" with your user specific details.

The example below will run AlphaFold on the T1050 file.

AlphaFold looks for the fasta file based on where you have specified the path. In the example below note it is:

--fasta_paths=${ALPHAFOLD_DIR}/T1050.fasta

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/bin/bash -l
#SBATCH --partition=gpu
#SBATCH --time=1:00:00
#SBATCH --gpus=1
#SBATCH --cpus-per-gpu=16
#SBATCH --mem-per-gpu=32G


eval "$(conda shell.bash hook)"

export ALPHAFOLD_DATADIR=/datasets/alphafold/
export ALPHAFOLD_DIR=/scratch/users/<Your k-number>/alphafold-2.3.2 ###HERE


echo INFO: SLURM_GPUS_ON_NODE=$SLURM_GPUS_ON_NODE
echo INFO: SLURM_JOB_GPUS=$SLURM_JOB_GPUS
echo INFO: SLURM_STEP_GPUS=$SLURM_STEP_GPUS
echo INFO: ALPHAFOLD_DIR=$ALPHAFOLD_DIR
echo INFO: ALPHAFOLD_DATADIR=$ALPHAFOLD_DATADIR
echo INFO: TMP=$TMP


output_dir=/scratch/users/<Your k-number>/Output-$SLURM_JOB_ID
mkdir -p $output_dir

echo INFO: output_dir=$output_dir

conda activate alpha
python3 -m pip install -r singularity/requirements.txt


python3 ${ALPHAFOLD_DIR}/singularity/run_singularity.py \
    --use_gpu \
    --output_dir=$output_dir \
    --data_dir=${ALPHAFOLD_DATADIR} \
    --fasta_paths=${ALPHAFOLD_DIR}/T1050.fasta \
    --max_template_date=2020-05-14 \
    --model_preset=monomer \
    --db_preset=reduced_dbs

echo INFO: AlphaFold returned $?

Submit the script

1
k1234567@erc-hpc-login1:/scratch/users/k1234567 sbatch ops-jupyter.sh