...
Tip | ||
---|---|---|
| ||
Remember that Pawsey systems have different file systems. Therefore, it is important that users put their files into the correct location. In general, user's personal folders in:
Refer to File Systems for HPC Filesystems for more information. |
A second recommendation: for a large number of files use tar to package them into a single file first
...
Note that the "--export=NONE" (which basically erases all environmental variables set previous to the submission of the script) is essential for SLURM submissions between machines. It ensures a consistent login environment is used on the target machine, and not one from the submitting machine.
Passphrase-less secure transfers
...
Transferring data
...
dcp
or distributed copy is a tool which uses MPI
to directories and large files in an efficient manner. This tool can be used only on data mover nodes (hpc-data.using a protocol based on SSH allows us to protect information and ensure its integrity. However, setting up a proper environment configuration can be tricky; if not done right, security risks arise. This is especially true when one wants to automate copy operations, for example through a SLURM job on data mover nodes. In such a scenario, a public key-based authentication method is recommended because the ssh client, running on a Pawsey's supercomputer node, will only need the private key to connect to a third-party system's ssh server, which in turn has the correspondent public key to be used to perform a secure handshake. The private key, however, must not be protected by a passphrase otherwise a human input is required. There are several issues to address in the described situation.
A user must generate a key-pair specifically for this purpose, i.e. data transfers from and to Pawsey systems (see Logging in with SSH keys). Let's call this key-pair: COPYPAIR
. Do not repurpose an existing key-pair used to log in to Pawsey or other systems (which by the way, should use a passphrase). This allows isolation of unauthorised accesses due to a compromised key-pair.
The ssh server on the third party system should be configured to avoid using COPYPAIR
's public key to authorise connections not originating from Pawsey's data mover machines. This is a powerful capability that protects the third party server from unauthorised use of COPYPAIR
from outside the Pawsey network. To enable the discussed feature users need to edit the COPYPAIR.pub
public key file and prepend the following string:
Code Block |
---|
from="hpc-data*.pawsey.org.au" no-port-forwarding no-pty |
followed by a space and followed by the original key.
Here is an example:
Code Block |
---|
from="hpc-data*.pawsey.org.au" no-port-forwarding no-pty ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDhGk1QdMVDVao1j9eclHPPhniU5x6rHYBhJp88DJZrEiDM3Kt70+gHvo/fCGaHmOMWQX0hjqLs5uin42VGUW7w3y0FrIBB/hZJro+JKXJzhUJFpTE/wR08CK8DI4c3GrxjrCqNRkd3ff4AOUIgS7VFGcmagg9aAj6iSas1ibvAMLMZuXkVyPcNcKhB+J38atc3u5/zuRqU9QgKGQvTQgLL7lx4CrsHGKd8bPzjdEVDaCoeD1KBdRq/S+am2wvaPwN5wqqgs6hVU83VvZggIBkGRLBbGEeMmnzu8dkG1osqE4S3RCmFVQ8MG9tiOiP0MN/jx/DpckP++NnuamJWcD/Z comment |
Transferring data between group
and scratch
...
dcp
or distributed copy is a tool which uses MPI
to directories and large files in an efficient manner. This tool can be used only on data mover nodes (hpc-data.pawsey.org.au).
It is available as a module
on data mover nodes. Users may use it directly from command line when on a data mover node or submit a job on copyq
.
Code Block |
---|
module load mpifileutils mpirun -np 4 dcp -p SOURCE DESTINATION |
SOURCE
can either be a file or directory. The option -p
preserves the file attributes e.g permissions, group association and ownership of the files.
Code Block |
---|
mpirun -np 4 dcp -f SOURCE DESTINATION |
Option -f
deletes the any files on in DESTINATION
directory if an error occurs during the operation.
Here is a sample output of a 100GB file copied from group
space to scratch
. (The file was striped over 4 OSTs)
...