Storage¶
The following storage is available on the cluster:
Name | Mount point | Current capacity | Mounts on | Purpose |
---|---|---|---|---|
Home folders | /users |
39 TiB | login, compute | Software, code, configuration and other basic files, files not directly used in scheduled jobs; small amounts of data where I/O speed is not critical |
Scratch | /scratch |
1.4 PiB | login, compute | Files used directly in or created by scheduled jobs, large amounts of data and/or where low latency and/or high bandwidth access is important |
Attention
Please see the terms of use for guidance on the types of data it is or is not appropriate to store on the system.
Important
Please note, that although there is some degree of resilience provided at the hardware level,
/scratch
is not backed up. Make sure that you always have appropriate backups of the data stored within.
Home folders¶
Your home directory should be used to store data, such as software, code, configurations, etc., in situations where I/O speed is not critical.
By default, this is only accessible by the owner, or the account holder. It is provisioned automatically when the account is created, and can be accessed via the path, /users/<user id>
, e.g. /users/k1234567
.
Scratch¶
The Ceph file system provides fast, high performance storage with built in resilience under the /scratch
path hierarchy.
This should be used to store data that is actively produced, or consumed by, computations, especially where
low latency and/or high bandwidth access is required. Different types of scratch are listed below.
Personal scratch¶
Only accessible by the owner, or the account holder and provisioned automatically when the account is created, user scratch space can be accessed via the path, /scratch/users/<user id>
, e.g. /scratch/users/k1234567
.
Project scratch¶
Scratch allocations beyond the personal defaults will be allocated as projects instead of groups. Project scratch will be accesible by all members of a project and is not part of an individuals allocation. Projects can also be individual.
Projects will be accessed via a path /scratch/prj/<project name>
, e.g. /scratch/prj/foo
.
Project scratch shares will be allocated based on the project registration data.
Group scratch¶
Group scratch is mainly provided for groups that have made the transition from Rosalind although for specific use cases can still be requested.
Accessible by the members of the group that owns the share. It will have its own quota allocation, that will not count towards the individual members’ own allocation. Accessed via the path, /scratch/groups/<group>
, e.g. /scratch/groups/biocore
.
Info
When requesting group shares in addition to the project registration data, please provide additional information about why a group is more appropriate than a project
Datasets¶
A special type of a group share designated to host datasets. By default it will be read-only, with a dataset owner(s) having write access and publicly accessible within the cluster.
Can be accessed via the /scratch/datasets/<dataset id>
path, e.g. /scratch/datasets/ukbiobank
.
Dataset shares are not provisioned automatically and have to be explictly requested.
Info
When requesting dataset shares, please provide the following information:
- Name of the dataset.
- Use case and the relevant infrormation about the dataset (estimated size, external source, etc).
- Data owner(s): Individuals that will be the primary point of contact for the data, and will be responsible for its management and access control. Data owner will automatically have write access.
- Lifetime: The amount of time after which the share can be expired.
Quotas¶
User accounts are currently allocated 50GiB for /users
and 200GiB for /scratch/users
.
Projects are allocated 1TiB of storage by default. Additional storage attracts a modest charge of £50/TB p.a..
Storage space is a finite and shared resource on CREATE; disk quotas are needed for various reasons:
- There is a limited amount of disk space that must be shared between many people.
- Sometimes processes can go out of control and produce huge amounts of data. If a disk fills up, no more data will be able to be saved, and people will lose work; for instance, someone who has been working with an editor may not be able to save their changes.
For these reasons and others, it is necessary to manage storage usage on the system. Disk quotas are an equitable way of doing this.
Info
Project, group and dataset share quotas are set and adjusted on-demand. Individual quota allocations may deviate from the above defaults.
Tip
Avoid going over your quota in your home directory (/users/<username>
); it can cause problems logging in.
When logging in your quota and the amount used will be displayed. To check your latest quota usage information, the following commands are available: ceph_quota
or rds_quota
.