To help us provide you with the most relevant content, a survey link has been added to the newsletter.
You are very welcome to provide contributions and suggestions about the Technical Newsletter via the survey or by emailing email@example.com
An optimised configuration for the workflow engine Nextflow at Pawsey
Author: Marco De La Pierre
Nextflow is a modern tool for automation of data analysis pipelines. It promotes reproducibility, portability and sharing of pipelines, and comes with native integration with job schedulers, orchestrators and container engines. Born within the realms of bioinformatics, it's actually almost domain-agnostic and has been effectively used in other data-intensive domain such as radioastronomy.
Last November Pawsey hosted a dedicated week-long workshop and hackathon, featuring the lead developers Paolo Di Tommaso and Evan Floden. Among the outcomes of this event, we've developed a template configuration file for Nextflow pipelines, that is optimised for running on Pawsey systems.
Key aspects of this configuration include:
- pipeline resumption,
- usage of filesystems,
- interface with Singularity containers,
- interface with the Slurm scheduler.
The full template can be found in this new user documentation page: Nextflow.
Passphrase-less secure data transfers
Author: Cristian Di Pietrantonio
We encourage the use of data mover nodes to transfer files from and to Pawsey systems. Furthermore, automation of such a task using jobs is allowed, but we require you to take extra steps to ensure it is done in a secure way. For this reason, the page Transferring files has been updated with a paragraph, entitled "Passphrase-less secure transfers", describing the proper and mandatory way of implementing secure transfer (through SSH based clients like scp, rsync) that does not require human input.
And while we are at it, we advise our users:
- to use a passphrase to protect their private key,
- to use ecdsa with a key size of 521 bits. RSA is still fine with a key of at least 2048 bits, better 4096 bits, but might not be completely safe in the future (source: https://www.ssh.com/ssh/keygen/).
Easy, parallelised data transfer from IRDS to Zeus, Magnus and Nimbus
Author: Sarah Beecroft
Transferring large volumes of data can be tricky, especially if the process is not automated. Using wget, we have been able to script and parallelise the process of data transfer from UWA IRDS to our systems. This approach may be applicable for other university-specific data storage systems. The guides How to transfer data from IRDS at UWA to Zeus and Magnus, and how to Transfer Your Data to Nimbus are available now.