When planning a supercomputing project it is important to appropriately design the whole data workflow, so that it meets all the recommendations and policies of filesystems available at the Pawsey Supercomputing Centre. For that reason, pages listed below should be carefully studied:
The /scratch filesystem should be used for running simulations. This filesystem is tuned to deliver high bandwidth for data accesses. Data sets should not be stored on /scratch filesystem after simulation completes. Unnecessary data should be removed from /scratch and important data should be copied to more appropriate place, e.g. /group filesystem, local institutional storage or long term storage. This process can be automatised with the use of queueing system features:
- job chaining - submit the next job from within a batch job at the start or the end of the job,
- multi-cluster operations - queueing system offers the ability to target commands to other clusters instead of, or in addition to, the local cluster on which the command is invoked.
One of the classical, yet still very useful, examples is so-called data staging which is presented here.
In this simple example we will present how to handle a simple data workflow within a single SLURM queueing job. The data workflow will consist of:
- running a simulation on Magnus - simulation will produce results,
- copying results to local storage with the use of copy queue on Zeus.
This can be done by submitting a single job on Magnus. This job submits a data copying job to Zeus copyq at the end.
The copyq job executes a Python script which should include all necessary commands for backing up / copying data to local institutional storage.