This page documents the principles behind the configuration, maintenance and support of the scientific software stack on Pawsey systems.
Organisation of the software stack
The software stack of Setonix is organised on three levels, each one with its own base directory path:
Note how the base path for the system-wide installation contains the date tag (year and month) of the initial stack build.
Within each level, there exist three key sub-directories:
software/: software builds using Spack
modules/: modulefiles related to Spack builds
containers/: Singularity containers and container modules
More specifically, the software tree as installed by Spack within each level is organised according to:
- Name of the cluster and corresponding operating system.
- CPU microarchitecture: this may vary across cluster partitions, and is essential for compiler optimisations.
- Compiler name and version: essential to allow assessing relative performance (for example, AMD vs GNU vs Cray).
- Optional GPU suffix: to distinguish between CPU only and GPU enabled builds of packages. Distinct suffixes are to be used for HIP, OpenMP offloading, and so on.
As a practical example from a Spack installation, here is the path for the library OpenBLAS 0.3.15 as initially installed on Setonix for the Zen3 architecture with the compiler GCC 11.2.0:
And here is the corresponding modulefile:
Organisation of modulefiles
Modulefiles installed by Pawsey staff are classified to improve readability of the list of available software. These are the main classes:
- containers/modules (containerised software)
- libraries (parallel, numerical, I/O)
- utilities (including pre-processing, post-processing, workflow tools, build tools)
- programming languages (compilers, debuggers, profilers, build tools)
- python packages
Additional software dependencies that do not need to be directly accessed are installed as hidden modules. This is to keep the list of available modules as concise as possible.
Modulefiles that come pre-installed by Cray have their own organisation; these include programming environments, optimised libraries, compilers, debuggers and profilers.
Installation tools and approaches
Users can install software using the following methods, which are listed in descending order of preference:
- Spack package manager. Spack is the recommended tool for installing software at the project level and user level.
- Container pulls. The available container engine is Singularity; container modules are made available by means of SHPC (Singularity Registry HPC).
- Other package managers (pip, conda) and language-specific built-in methods (such as for R).
- Manual builds (with build tools and compilers) or container builds.
For recommended strategies when installing software at the project level and user level, see the How to Install Software section.
For more information about working with containers and SHPC, see the Containers section.
The system-wide software stack is organised in categories to facilitate browsing. Categories have been customarily defined, so that each one of them contains a moderate, listable amount of packages.
For the full list of software that will be supported system-wide, see List of Supported Software.
HPC scientific applications and libraries
This is a defined collection of popular HPC scientific packages that are highly scalable and will be fully supported on Pawsey systems.
System-wide installations are provided, typically by means of Spack. A notable exception is OpenFoam, for which Pawsey also provides optimised containers that mitigate I/O performance issues in workflows generating large numbers of files (in the millions).
Non-HPC applications, utilities and visualisation tools
This collection includes scientific applications with limited scalability, utilities for preprocessing/post-processing/plotting, workflow management tools, and language interpreters such as Python, R and Julia.
System-wide installation is kept to a minimum for these, using Spack wherever possible.
Use cases where Python enables supercomputing research are supported through a defined subset of optimised libraries, which are preinstalled.
A defined set of libraries is installed and maintained system-wide, both as individual modules and as a collated collection. Optimised containers are also provided by Pawsey as an alternative, with a matching collection of libraries. The installation procedure gives precedence to methods that allow direct leverage of system-wide performance libraries (currently Spack).
R and other languages
Only the core R framework is installed and maintained system-wide; all package installations are left to users. This takes into consideration:
- The diversity of R packages and user needs
- The typically articulated dependency trees, which are time-consuming to install
- The frequent need for
- Non-performant libraries
- The lack of gold-standard R packages for supercomputing use cases
Similar policies apply to other emerging languages in the supercomputing domain, such as Julia, as well as to Perl, Ruby and Rust.
A list of key software is maintained and installed system-wide, using Spack or pip.
This domain has thousands of packages, mostly either serial or multi-threaded. The Bioconda and Biocontainers projects make available thousands of packages through conda and containers. Containers (via SHPC) are the preferred installation method for bioinformatics.
Only the most popular packages are made available system-wide. Additional requests can be accommodated system-wide, provided the corresponding Biocontainers are available; the list of software can grow flexibly over time, by leveraging software deployment using containers.
This domain currently features a limited number of highly popular frameworks, which are characterised by complex installation procedures and GPU acceleration. These frameworks are available via Spack; AMD containers are under development for some of them. When the GPU partition becomes available, system-wide installation will be provided for the few most popular frameworks. Containers will be preferred unless major shortcomings are identified in this approach.
Considering the evolving status of this domain, new software that can benefit from running on a supercomputer will be regularly monitored for.
Use of software covered by any license agreement is strictly bound by that agreement.
Pawsey does not purchase licensed software apart from compilers, profilers, debuggers and similar.
Licensed software can be installed on behalf of users if it does not violate the terms of the licence. A copy of the licence must be sent to Pawsey with the request for installation or access to the software. The process depends on the software vendor; in some cases we forward your request to the vendor for approval. If a licence server is required (such as FlexLM), you will need to make appropriate arrangements with your own organisation. This will likely involve firewall changes in that organisation. Contact the Pawsey helpdesk to obtain the relevant network information and for help with testing. Pawsey will not host licences on its own servers.
Some licensed software, such as Amber, Ansys, CPMD, NAMD and VASP, is not visible to the
module avail command until you have been added to the relevant permission group for that software. In most cases it is quick and straightforward to add you to the permission group, depending on the terms of the specific software licence.
Users can install licensed software by themselves, provided this is in compliance with the software license agreement.
Levels of software service
Merit Allocations (NCMAS and Pawsey Partner Scheme)
When a required software is not available system-wide, researchers have to build it on their own, relying on the user documentation. In some cases and only if unsuccessful attempts to install the software are documented, Pawsey may be able to assist in the installation process, by targeting a standard, off-the-shelf installation.
PaCER, strategic and uptake projects
Pawsey staff may assist in the customised installation, configuration and tuning of the software.
Supported packages and versions
The system-wide software stack is made up of applications that are recognised by Pawsey staff as popular or relevant to supercomputing at the time of Setonix deployment. Addition of a package to the system-wide stack may be considered when such package is requested by more than four projects within the Pawsey user base.
In general, "3+1" co-existing versions of software are supported, accounting for an implicit classification of one "legacy", one "stable", one "latest". When a newer, "latest", version of a package is installed, the "legacy" version is removed. A fourth version may need to be installed outside maintenance (see "Frequency of Version Updates" below), in which case four versions may co-exist until next maintenance.
An exception to this "3+1" rule is required for GNU compilers: multiple installed versions may be needed, to cover the compatibility needs of various domains and software.
Another exception is for scientific libraries such as Boost and HDF5 that are a dependency to many applications, which might have specific version or build option requirements. In this case, a legacy version can be dropped only if there are no installed applications that depend on it.
Frequency of version updates
Like OS and Cray updates, installed versions for system-wide packages are generally updated two times per year, in January and July. In these instances, a full stack rebuild is performed. Frequency of software updates will likely be higher in the first year of operation of Setonix.
Installed packages are monitored for new available versions every month; if there's merit for performance benefits or new interesting features, a new version may be installed outside the two time slots above. Relevant bug fix version updates may also happen outside the general time frame, with a contextual removal of the buggy version at next maintenance.
For single users and groups requiring an older package version, or a newer version outside of the six-monthly updates, a user or group installation is recommended.
Deprecation notices for legacy versions are published in at least the two preceding technical newsletters; that is, with two months notice. Version update notices are provided in the technical newsletter following the updates.
Versions in use for compilers and other build dependencies
At the time of the six-monthly updates, the latest stable version is identified for each of the Cray tools, compilers and dependency libraries (such as MPI, GPU and numerical libraries). This becomes the default and only version to be used for building other applications until the next six-monthly update, even if newer versions of these tools are installed.
When building system-wide software these directions for the adopted compilers are followed:
- End-user applications:
Only the default version of the GNU compiler is used for builds. Non-default compiler versions are used only for those packages that strictly need it.
- Numerical libraries:
At present only the default version of the GNU compiler is used for builds. Library builds with multiple compilers will be considered in the future.
- Utilities that are not performance critical (such as text editors, Singularity):
Only the default version of the GNU compiler is used for builds.
No default versions
Default package versions are disabled within the module system; in other words, when loading a module the software version must always be specified, otherwise an error will occur.
End of support for Magnus and Galaxy
At the time Setonix Phase 1 becomes operational and migration starts, support on system-wide installations on Magnus and Galaxy will cease.