Transferring files to (and from) CREATE¶
The scp, rsync and sftp methods laid out below are for transferring limited datasets (up to ~500GB) to (and from) CREATE HPC.
Datasets over 500GB in size can be pulled from sources using an interactive session (srun) or a job submitted using squeue. See the final section on this page for further details and an alternative.
Using scp¶
One of the easiest and quickest ways to transfer files is using scp (secure copy).
It is strongly recommended that you use an ~/.ssh/config
file as outlined in the relevant sections for MacOS and linux or Windows on the Accessing CREATE page.
Using the config file in the same format as the example given means that to copy a file called hello_world.sh from your local machine to your home directory on CREATE would be as simple as running the following command from your local machine in the same directory the file was in:
Linux and MacOS:
1 |
|
Windows:
1 |
|
To copy a file called hello_er.out from the scratch users area to the directory the shell was open in would be:
Linux and MacOS:
1 |
|
Windows:
1 |
|
Take note
scp
has been reactivated on the login nodes, to use scp
you no longer need to use erc-hpc-dm1
rsync¶
For more advanced copy and file transfer capabilities rsync is available and well documented here.
SFTP¶
It is strongly recommended that you use an ~/.ssh/config
file as outlined in the relevant sections for MacOS and linux or Windows on the Accessing CREATE page.
SFTP ensures that the data you send is encrypted.
If you have created a config file as specified above then to open an SFTP connection you can simply use:
$ sftp create
else use:
$ sftp k1234567@hpc.create.kcl.ac.uk
To use the FileZilla GUI client see here
Larger file transfers¶
For larger file transfers an interactive session, script submitted using squeue or erc-hpc-dm1
can be used.
When making large file transfers immediate sequential file operations should be avoided (like mv
or rename
).
The file system is fast, however, the nodes can be faster and can cause issues
e.g. if they try to rename something that has not been replicated to all stores ceph may continue to try to replicate the original file.
Note
You will have to approve your login through the MFA page on the portal.
Note
RDS is not accessible from the HPC compute-nodes, which includes sbatch
and srun
job allocations.
If your data transfer is expected to take a long time, it is advisable to complete your transfer through a screens or tmux session so that you can reconnect to your ssh session if you happen to lose connection.
To connect to erc-hpc-dm1
, you can use your existing ssh key saved in portal for the HPC or set up a new ssh key:
1 2 3 |
|
1 2 3 |
|
An example of using rsync
to copy a directory called data
from an example RDS project to a scratch HPC project space on CREATE.
An ssh config alias can also be setup, add the following to file ~/.ssh/config
:
1 2 3 4 5 |
|
1 2 3 4 5 6 |
|
Using this config file will shorten the access command for erc-hpc-dm1
to: $ssh create-dm1