Storage¶

The following storage is available on the cluster:

Name	Mount point	Current capacity	Mounts on	Purpose
Home folders	`/users`	39 TiB	login, compute	Software, code, configuration and other basic files, files not directly used in scheduled jobs; small amounts of data where I/O speed is not critical
Scratch	`/scratch`	1.4 PiB	login, compute	Files used directly in or created by scheduled jobs, large amounts of data and/or where low latency and/or high bandwidth access is important

Attention

If you think your project will use sensitive or PII data please see the documentation on Trusted Research Environments.

Important

Please note, that although there is some degree of resilience provided at the hardware level, /scratch is not backed up. Make sure that you always have appropriate backups of the data stored within.

Home folders¶

Your home directory should be used to store data, such as software, code, configurations, etc., in situations where I/O speed is not critical. By default, this is only accessible by the owner, or the account holder. It is provisioned automatically when the account is created, and can be accessed via the path, /users/<user id>, e.g. /users/k1234567.

Scratch¶

The Ceph file system provides fast, high performance storage with built in resilience under the /scratch path hierarchy. This should be used to store data that is actively produced, or consumed by, computations, especially where low latency and/or high bandwidth access is required. Different types of scratch are listed below.

Personal scratch¶

Only accessible by the owner, or the account holder and provisioned automatically when the account is created, user scratch space can be accessed via the path, /scratch/users/<user id>, e.g. /scratch/users/k1234567.

Project scratch¶

Scratch allocations beyond the personal defaults will be allocated as projects instead of groups. Project scratch will be accesible by all members of a project and is not part of an individuals allocation. Projects can also be individual.

Projects will be accessed via a path /scratch/prj/<project name>, e.g. /scratch/prj/foo.

Project scratch shares will be allocated based on the project registration data.

Group scratch¶

Group scratch is mainly provided for groups that have made the transition from Rosalind although for specific use cases can still be requested.

Accessible by the members of the group that owns the share. It will have its own quota allocation, that will not count towards the individual members’ own allocation. Accessed via the path, /scratch/groups/<group>, e.g. /scratch/groups/biocore.

Info

When requesting group shares in addition to the project registration data, please provide additional information about why a group is more appropriate than a project

Datasets¶

A special type of a group share designated to host datasets. By default it will be read-only, with a dataset owner(s) having write access and publicly accessible within the cluster. Can be accessed via the /scratch/datasets/<dataset id> path, e.g. /scratch/datasets/ukbiobank.

Dataset shares are not provisioned automatically and have to be explictly requested.

Info

When requesting dataset shares, please provide the following information:

Name of the dataset.
Use case and the relevant infrormation about the dataset (estimated size, external source, etc).
Data owner(s): Individuals that will be the primary point of contact for the data, and will be responsible for its management and access control. Data owner will automatically have write access.
Lifetime: The amount of time after which the share can be expired.

Quotas¶

User accounts are currently allocated 50GiB for /users and 200GiB for /scratch/users.

Projects are allocated 1TiB of storage by default. Additional storage attracts a modest charge of £50/TB p.a..

Storage space is a finite and shared resource on CREATE; disk quotas are needed for various reasons:

There is a limited amount of disk space that must be shared between many people.
Sometimes processes can go out of control and produce huge amounts of data. If a disk fills up, no more data will be able to be saved, and people will lose work; for instance, someone who has been working with an editor may not be able to save their changes.

For these reasons and others, it is necessary to manage storage usage on the system. Disk quotas are an equitable way of doing this.

Info

Project, group and dataset share quotas are set and adjusted on-demand. Individual quota allocations may deviate from the above defaults.

Tip

Avoid going over your quota in your home directory (/users/<username>); it can cause problems logging in. When logging in your quota and the amount used will be displayed. To check your latest quota usage information, the following commands are available: ceph_quota or rds_quota.