Dataset Guide¶
Published on 30th November, 2022. Updated on 24th January, 2025.
Introduction¶
Dataset is referring to data stored under /datasets on CREATE HPC.
The purpose of dataset is to share commonly used data among users.
The benefit of using a dataset includes:
- prevent duplication
- restrict user access to execute and read only. (write access can be provided to principal investigator, administrator and maintainer)
Datasets that are available on CREATE HPC¶
These are the following datasets that are available for request within CREATE:
- Answerals
- Bioresource
- BSTOP
- commonmind
- NYGC and Target ALS
- ProjectMineDF2
- ROSMAP
- SEA-AD
- TCGA
- TREM2
- UK Biobank
- UK DRI MAP
These are the datasets publicly available to all users within CREATE:
If you need to obtain access to any of the dataset listed above, please apply for access on the e-research portal groups page or, if there is not an "Apply" button visible for the group, email one of the named contacts to request access.
Answerals¶
Role | Name | |
---|---|---|
Principal Investigator | alfredo.iacoangeli@kcl.ac.uk | Alfredo Iacoangli |
Bioresource¶
Role | Name | |
---|---|---|
Principal Investigator | gerome.breen@kcl.ac.uk | Gerome Breen |
Administrator | jonathan.coleman@kcl.ac.uk | Jonathan Coleman |
Administrator | sang_hyuck.lee@kcl.ac.uk | Sang-Hyuck Lee |
Administrator | rujia.1.wang@kcl.ac.uk | Rujia Wang |
BSTOP¶
Role | Name | |
---|---|---|
Principal Investigator | michael.simpson@kcl.ac.uk | Michael Simpson |
Administrator | nick.dand@kcl.ac.uk | Nick Dand |
commonmind¶
Role | Name | |
---|---|---|
Principal Investigator | TBA |
image_net¶
Role | Name | |
---|---|---|
Principal Investigator | TBA |
NYGC and Target ALS¶
Role | Name | |
---|---|---|
Principal Investigator | alfredo.iacoangeli@kcl.ac.uk | Alfredo Iacoangli |
ProjectMineDF2¶
Role | Name | |
---|---|---|
Principal Investigator | alfredo.iacoangeli@kcl.ac.uk | Alfredo Iacoangli |
Administrator | aminah.2.ali@kcl.ac.uk | Aminah Ali |
ROSMAP¶
Role | Name | |
---|---|---|
Principal Investigator | jernej.ule@kcl.ac.uk | Jernej Ule |
Administrator | charlotte.capitanchik@kcl.ac.uk | Charlotte Capitanchik |
Administrator | silvia.hnatova@kcl.ac.uk | Silvia Hnatova |
SEA-AD¶
Role | Name | |
---|---|---|
Principal Investigator | jernej.ule@kcl.ac.uk | Jernej Ule |
Administrator | charlotte.capitanchik@kcl.ac.uk | Charlotte Capitanchik |
Administrator | silvia.hnatova@kcl.ac.uk | Silvia Hnatova |
TCGA¶
Role | Name | |
---|---|---|
Principal Investigator | francesca.ciccarelli@kcl.ac.uk | Francesca Ciccarelli |
TREM2¶
Role | Name | |
---|---|---|
Principal Investigator | alfredo.iacoangeli@kcl.ac.uk | Alfredo Iacoangli |
Principal Investigator | angela.k.hodges@kcl.ac.uk | Angela Hodges |
Principal Investigator | richard.j.dobson@kcl.ac.uk | Richard Dobson |
UK Biobank¶
Role | Name | |
---|---|---|
Principal Investigator | gerome.breen@kcl.ac.uk | Gerome Breen |
Administrator | jonathan.coleman@kcl.ac.uk | Jonathan Coleman |
Administrator | alexandra.a.gillett@kcl.ac.uk | Alexandra Gillett |
UK DRI MAP¶
Role | Name | |
---|---|---|
Principal Investigator | jernej.ule@kcl.ac.uk | Jernej Ule |
Administrator | charlotte.capitanchik@kcl.ac.uk | Charlotte Capitanchik |
Administrator | silvia.hnatova@kcl.ac.uk | Silvia Hnatova |
Working with Datasets: An ImageNet Example¶
ImageNet is an image database organised according to the WordNet hierarchy, in which each node of the hierarchy is depicted by hundreds and thousands of images
Loading the data set¶
Tip
The ImageNet dataset available via /datasets/image_net
is very large.
If you wish to work with a smaller dataset, you can download a subset Tiny ImageNet; otherwise, you can use the whole dataset and skip this step.
1 2 3 4 5 |
|
Running a python script¶
Note
Before proceeding, ensure you have PyTorch installed. Please refer to our PyTorch documentation for installation instructions
Either run your own python script or run this basic script for training a ResNet model on the dataset using PyTorch (adjust accordingly):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
Submitting the training Job¶
To run the training on the cluster, use a SLURM batch scipt:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Now run the bash script:
Tip
Ensure you have activated your virtual environment and are on a compute node
1 |
|