Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Tip
titleCopy your files into the correct location

Remember that Pawsey systems have different file systems. Therefore, it is important that users put their files into the correct location. In general, user's personal folders in:

  • /home should be left to operating system related matters
  • /group should be used for installing and developing tools and for use case of persistent data storage
  • /scratch and /astro should be used for running your cases. All transient (intermediate, input and output) data should be there temporarily. Results should be analyzed immediately and deleted afterwards. If files need to be kept, they should be transferred to your institution storage and removed from /scratch or /astro. If data will be used persistently it can then be stored in /group or HSM (if applicable).

 Refer to File Systems for HPC Filesystems for more information.

A second recommendation: for a large number of files use tar to package them into a single file first

...

Note that the "--export=NONE" (which basically erases all environmental variables set previous to the submission of the script) is essential for SLURM submissions between machines. It ensures a consistent login environment is used on the target machine, and not one from the submitting machine.

Passphrase-less secure transfers

...

Transferring data

...

dcp or distributed copy is a tool which uses MPI to directories and large files in an efficient manner. This tool can be used only on data mover nodes (hpc-data.using a protocol based on SSH allows us to protect information and ensure its integrity. However, setting up a proper environment configuration can be tricky; if not done right, security risks arise. This is especially true when one wants to automate copy operations, for example through a SLURM job on data mover nodes. In such a scenario, a public key-based authentication method is recommended because the ssh client, running on a Pawsey's supercomputer node, will only need the private key to connect to a third-party system's ssh server, which in turn has the correspondent public key to be used to perform a secure handshake. The private key, however, must not be protected by a passphrase otherwise a human input is required. There are several issues to address in the described situation.

A user must generate a key-pair specifically for this purpose, i.e. data transfers from and to Pawsey systems (see Logging in with SSH keys). Let's call this key-pair: COPYPAIR. Do not repurpose an existing key-pair used to log in to Pawsey or other systems (which by the way, should use a passphrase). This allows isolation of unauthorised accesses due to a compromised key-pair.

The ssh server on the third party system should be configured to avoid using COPYPAIR's public key to authorise connections not originating from Pawsey's data mover machines. This is a powerful capability that protects the third party server from unauthorised use of COPYPAIR from outside the Pawsey network. To enable the discussed feature users need to edit the COPYPAIR.pub public key file and prepend the following string:

Code Block
from="hpc-data*.pawsey.org.au" no-port-forwarding no-pty

followed by a space and followed by the original key.

Here is an example:

Code Block
from="hpc-data*.pawsey.org.au" no-port-forwarding no-pty ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDhGk1QdMVDVao1j9eclHPPhniU5x6rHYBhJp88DJZrEiDM3Kt70+gHvo/fCGaHmOMWQX0hjqLs5uin42VGUW7w3y0FrIBB/hZJro+JKXJzhUJFpTE/wR08CK8DI4c3GrxjrCqNRkd3ff4AOUIgS7VFGcmagg9aAj6iSas1ibvAMLMZuXkVyPcNcKhB+J38atc3u5/zuRqU9QgKGQvTQgLL7lx4CrsHGKd8bPzjdEVDaCoeD1KBdRq/S+am2wvaPwN5wqqgs6hVU83VvZggIBkGRLBbGEeMmnzu8dkG1osqE4S3RCmFVQ8MG9tiOiP0MN/jx/DpckP++NnuamJWcD/Z comment


Transferring data between group and scratch

...

dcp or distributed copy is a tool which uses MPI to directories and large files in an efficient manner. This tool can be used only on data mover nodes (hpc-data.pawsey.org.au).

It is available as a module on data mover nodes. Users may use it directly from command line when on a data mover node or submit a job on copyq

 


Code Block
module load mpifileutils
mpirun -np 4 dcp -p SOURCE DESTINATION

SOURCE can either be a file or directory. The option -p preserves the file attributes e.g permissions, group association and ownership of the files. 

 


Code Block
mpirun -np 4 dcp -f SOURCE DESTINATION

Option -f deletes the any files on in DESTINATION directory if an error occurs during the operation.  


Here is a sample output of a 100GB file copied from group space to scratch. (The file was striped over 4 OSTs)

...