Points of Contact:
- Mark O'Shea
- Ugo Varreto
- Pawsey users
- Major change will be the upgrade of the Operating System on the production Crays (Magnus and Galaxy)
so that Pawsey will continue to receive security updates from Cray who ended-the-life-of our current OS
some time ago now, although we have waited until the yearly maintenance to do the upgrade so as to
minimise the downtime of the production Crays.
The impact on users is detailed at CLE OS Upgrade Effects: PAWSEY_OS (from CLE 6.0.UP05 to CLE 6.0.UP07).
- Reduction of /home quota on the supercomputers from 10GB to 1GB. See Quota Limit on your home area.
- Slurm upgraded to 19.05.5
- Intel compiler suite and Cray Development Toolkit will have 2019 options instead of just the 2017 versions.
(The three-year-old versions will remain the defaults though)
- 2019-12-20 12:00 Note that whilst the Estimated End Time of this maintenance matches the one in the
announcement email, some of the CBIS (CSIRO's Business and Infrastructure Service) work will not now
be going ahead, and so Pawsey staff will be able to commence their work ahead of the original time,
a scenario that we expect will see us able to complete the maintenance ahead of the original End Time
- 2020-01-03 06:30 Both production Crays were taken out of service
- 2020-01-03 08:00 Work commenced on the UP07 upgrade to the production Crays
- 2020-01-03 18:08 Both production Crays exhibited errors after their firmware upgrades
Work continues to rectify this issue
- 2020-01-03 19:07 Galaxy was completely flashed: Magnus still fighting
- 2020-01-03 19:55 Less Magnus components failed to be flashed the second time
Join us tomorrow for more upgrade fun!
- 2020-01-04 09:30 Magnus has now been completely flashed.
Work continues towards returning the production Crays into service
- 2020-01-04 14:20 The PrgEnv images have been built. CLE OS images are currently building.
- 2020-01-05 11:30 The CLE system within Galaxy has booted
Work continues towards booting the CLE system within Magnus and the eLogin systems
- 2020-01-05 13:00 The CLE system within Magnus has booted
Work continues towards booting the eLogin systems and completing acceptance testing
- 2020-01-05 16:30 The eLogin systems have been seen to boot
Work will continue tomorrow (Mon 6th), mainly around completing acceptance testing, along
with some hardware fixes that require our on-site Cray engineer
- 2020-01-06 12:00 Our on-site Cray engineer carried out the hardware swaps.
- 2020-01-06 18:00 Testing of the redeployed systems is ongoing.
- 2020-01-07 16:00 Galaxy is being returned to service. Any issues: email firstname.lastname@example.org
Magnus should be back soon
- 2020-01-07 16:40 Zeus and Topaz have been returned to service
- Slurm 19.05.5
- Nvidia Based GPU nodes are running driver "440.33.01" which will support up to Cuda 10.2
- 2020-01-07 19:10 Magnus is being returned to service, however only magnus-1
is currently available as a job submission resource whilst we use magnus-2 to
investigate a mount timing issue.
Any other issues: email email@example.com
- 2020-01-08 14:40 The Shifter module, whilst still present, no longer functions.
Pawsey recommendation for containerised workflows is Basics of Singularity .