Overview of HPC Systems
The Pawsey Supercomputing Centre operates several High Performance Computing (HPC) systems:
- , a petascale Cray XC40 intended for large scalable jobs that require hundreds to thousands of CPU cores;
- Zeus, a HPE cluster for workflows that require large numbers of smaller jobs, large memory, data transfers or longer wall times; and
- Galaxy, a Cray XC30 dedicated to the operational requirements of the ASKAP and MWA radio telescopes.
- Topaz, a GPU cluster for GPU-accelerated workflows.
All of these systems physically reside at the Pawsey Supercomputing Centre and are closely integrated with other Pawsey infrastructure. This includes the RDSI Collection Development node and the Hierarchical Storage Management system, allowing a diverse range of workflows to be undertaken.
Magnus (Cray XC40)
Magnus is a Cray XC40 supercomputer, a massively parallel architecture consisting of 1,488 individual nodes that are connected by a high speed network.
It is suitable for capability workflows that require a modest number of large jobs that each require hundreds or thousands of cores.
The high speed network is the HPC-optimised Aries interconnect, which utilizes the Dragonfly network topology to enable parallel codes using the Message Passing Interface (MPI) to communicate between the compute nodes. Each compute blade has a single Aries ASIC, which provides about 72Gbits/sec for each of the 4 nodes on the blade. The Aries/Dragonfly interconnect has been designed for low latency and efficient small message transfer.
All compute nodes in Magnus have the same architecture. Each includes two Intel Xeon E5-2690 v3 (Haswell) 12-core CPUs for a total of 24 cores per node, providing a total of 35,712 cores for the entire system.
Each node has 64 GB of DDR4 memory shared between 24 cores. Each core has 32KB instruction and data caches, and one 256KB L2 cache; 12 cores (per socket, or NUMA region) share one 30 MB L3 cache. In total the system has 93 terabytes of memory.
The following is a schematic representation of the node architecture:
All of the Pawsey HPC systems, including Magnus, mount the various Pawsey HPC Filesystems over an Infiniband network, for fast parallel access to files actively used by jobs on the system.
Magnus has two login nodes, accessible by the hostname which distributes users onto either login node via round-robin DNS. To run jobs on the compute nodes, the jobs should be submitted from the login nodes. Compute nodes cannot be directly accessed from the internet.
Jobs on Magnus have a maximum walltime of 24 hours on the regular work queue, and 1 hour on the debug queue.
The maximum power consumption of the system is approximately 90 kW per cabinet. Average power consumption is around 50kW per cabinet or 400 kW for the system.
Significant allocations on Magnus are awarded through various merit schemes.
Zeus (HPE Cluster)
Zeus is a HPE Linux cluster, containing different types of CPU nodes to support different types of computational workflows.
For throughput workflows that require large numbers of jobs that each use a modest number of cores, a 80 nodes partition, workq is available. Each node has two Intel Xeon E5-2680 v4 2.4 GHz (Broadwell) with 14-core per CPUs and 128 GB of RAM.
For long-running workflows that require several days to complete, a 8 nodes partition, longq is also available for computationally intensive jobs with wall times up to 4 days. Each node has 28 cores with 128 GB of RAM.
For large memory workflows that require more than 128 GB of RAM, a 6 nodes partition, higmemq is available for memory intensive computational jobs with 1TB of RAM each. Each node has 16 cores with 1TB of RAM.
For debugging and development work, a 8 nodes partition, debugq is available for code development and prototyping work. Each node has 28 cores with 128 GB of RAM
For data transfer jobs, a 8 nodes partition, copyq is available for copying and transferring data. These nodes each contain two Intel Xeon E5-2650 (Sandy Bridge) 8-core CPUs and 64 GB of RAM.
This is summarised in the following table:
|Hostname||CPU||Cores Per Node||GPU||RAM||SLURM Partition(s)|
|hpc-data1 to hpc-data4||Intel Xeon E5-2650 2.0GHz||16||(none)||64 GB||copyq|
|z043 to z134||Intel Xeon E5-2680 v4 2.4 GHz||28||(none)||128 GB||workq, debugq, longq|
|z135 to z140||Intel Xeon E5-2620 v4 2.1 GHz||16||(none)||1 TB||highmemq|
Jobs on Zeus have a maximum walltime of 24 hours on regular queues; exceptions are as follows: debug queues have 1 hour; longq and highmemq have 96 hours; copyq has 48 hours.
Zeus has one login node, accessible by the hostname . To run jobs on the compute nodes, the jobs should be submitted from the login node. Compute nodes cannot be directly accessed from the internet.
All of the Pawsey HPC systems, including Zeus, mount the various Pawsey HPC Filesystems over an Infiniband network, for fast parallel access to files actively used by jobs on the system.
Unlike the Cray systems at Pawsey, nodes in the work queue of Zeus are now shared. That means a user may request a specific number of cores or a certain amount of memory (up the maximum available on a node), rather than being allocated an entire node, regardless of whether it is fully utilised or not. This also means that multiple jobs by different users may be running on the same of node. While these jobs are separate from each other (i.e., they can't access memory in another job on the same node), there may still be some performance variability, depending on how jobs are using memory or cores on a node.
Zeus access is provided to Magnus merit projects, or modest allocations can be applied for via the Director Share.
Galaxy (Cray XC30)
Galaxy is a Cray XC30 supercomputer. That is, it is a massively parallel architecture of many individual nodes that are connected by a high-speed network.
Galaxy is only available for radio-astronomy-focused operations. In particular, it is used to support ASKAP and MWA, which are two of the Square Kilometre Array precursor projects currently under way in the north-west of Western Australia. For ASKAP, Galaxy acts as a real-time computer, allowing direct processing of data delivered to the Pawsey Centre from the Murchison Radio Observatory.
Galaxy has 472 XC30 nodes each containing two 10-core Intel Xeon E5-2690V2 'Ivy Bridge' processors which share a total of 64 GB of main memory. The 472 CPU nodes (with a total of 9,440 cores) provide a peak performance (LINPACK) of 192.1 TeraFlops.
Each Ivy Bridge chip shares a 25MB L3 cache; each core has a 256KB L2 cache and 32KB L1 cache. Each core runs at 3.0 GHz and can support two threads in hardware ("hyper-threading"). The situation is represented schematically in the following diagram:
Galaxy has 64 nodes which each house one NVIDIA K20X "Kepler" GPU card and one 8-core Xeon E5-2670 "Sandy Bridge" chip acting as the host CPU. The GPU has 14 streaming multiprocessors (SMs) with a total of 2,688 single precision cores (192 per SM) and 896 double precision cores (64 per SM) delivering a peak performance of 3.95 TeraFlops (single precision) and 1.31 TeraFlops (double precision), respectively. Global GPU memory is 5.25 GB (GDDR5) with 64 KB constant memory and shared memory of 64 KB per SM. The CUDA compute capability is 3.5.
The CPU has 32 GB host memory; each CPU core runs at 2.6 Ghz and can support two hardware threads. The CPU shares a 20MB cache between 8 cores; each core has a 256KB L2 cache and 32KB L1 cache.
Note that the GPU nodes have different generations of CPUs than the CPU nodes, i.e. Sandy Bridge versus Ivy Bridge. They also have different core counts and main memory as well, i.e. 1 CPU (8 cores) per GPU node versus 2 CPUs (20 cores) per CPU node, and 32GB versus 64GB per node
Connecting all the nodes together is the HPC-optimized Cray Aries interconnect, which utilizes the Dragonfly network topology to enable parallel codes using MPI – Message Passing Interface, to communicate between the compute nodes. Each compute blade has a single Aries ASIC, which provides about 72Gbits/sec for each of the 4 nodes on the blade. The Aries interconnect and Dragonfly topology has been designed for low latency and efficient small message transfer.
Galaxy has a 1.3 Petabyte Lustre scratch file system, provided by a Cray Sonexion 1600 appliance storage system and connected via FDR Infiniband.
Galaxy has two login nodes. The generic hostname distributes users onto either login node via round-robin DNS. To run jobs on the compute nodes, submit jobs from the login nodes. Compute nodes cannot be accessed from the internet.
Topaz is a 33 node GPU Linux cluster with four partitions and has the following configuration:
|CPU||Cores Per Node||GPU||RAM||SLURM Partition(s)|
|t001 to t020||2 x Intel Xeon Silver 4215 2.5GHz|
8 x 2 =16
|2 x NVIDIA V100|
GPU-16 GB HBM2 memory each
|t021 to t022||2 x Intel Xeon Silver 4215 2.5GHz||8 x 2 =16||2 x NVIDIA V100|
GPU-16 GB HBM2 memory each
|a081 to a089||2 x Intel Xeon E5-2680 v4 2.4GHz||14 x 2 = 28||4 x Tesla P100|
GPU-16 GB HBM2 memory each
|a090 to a091||2 x Intel Xeon E5-2680 v4 2.4GHz||14 x 2 = 28||4 x Tesla P100|
GPU-16 GB HBM2 memory each
Nodes on the gpuq partition are configured as a shared resource, i.e. multiple users can run jobs on the same node, whereas nodes on the nvlinkq partition are configured as an exclusive resource. i.e, all GPUs on a given node are allocated to a single user only and service units are a function of the total GPU-hours per node.
See also Queue Policies and Limits for Topaz
The four partitions (queues) on Topaz have different characteristics.
- gpuq - has 20 nodes (t001 to t020) that are available for computational jobs with a maximum wall-time of 24 hour
- gpuq-dev - has 2 nodes (t021, t022) that are available for debugging and development jobs with a maximum wall-time of 1 hour
- nvlinkq - has 9 nodes (a081 to a089) that are available for computational jobs with a maximum wall-time of 24 hour
- nvlinkq-dev - has 2 nodes (a090, a091) that are available for debugging and development jobs with a maximum wall-time of 1 hour
Access to Topaz can be requested by submitting a ticket through the User Support Portal.