The Pawsey Supercomputing Centre schedules Maintenances for the first Tuesday of every month, between 0800 and 2000.
Whilst not every Maintenance will require outages to services, Pawsey systems should be considered as "at risk" during those times.
- Galaxy GPU de-commisioning: nodes within Galaxy will be re-arranged to see one cabinet just contain GPU nodes
This will allow for power isolation, as well as seeing the a better distribution of the Aries HSN across the remaining
nodes within the cabinets
- Removal of all modules installed system-wide on Zeus and Magnus/Galaxy which depend directly or indirectly on Python 2
- Singularity 3.6.4 will become default version on all Pawsey systems
- Jobs that were in the queues on Magnus, Zeus and Topaz, prior to the shutdown will have been purged as a result
of the new project allocations having been applied.
- 0845: Crays are powered down: maintenance work commences
- 1030: Two nodes in Galaxy and one in Magnus have dislayed errors at they powered up
Our on-site Cray Engineer is investigating the issues
- 1100: The Slurm database upgrade is progressing: this will facilitate the entry of the
project allocations for the new year
- 1540: The CLE portion of Magnus has rebooted: acceptance testing is in progress
- 1630: Topaz has passed acceptance tests
The CLE portion of Galaxy has rebooted: acceptance testing is in progress
- 1800: All systems are rebooted, however we are investigating some issues with the
new allocations, and a couple of acceptance test, before retuning resources to service
- 1845: Galaxy returned to service
- 1845: Magnus, Topaz and Zeus are all ready to return.
New allocations for the year are being applied, so users may not be able to submit jobs
until their accounts have been allocated (will run through the night).
- 1845: Garrawarla returned to service.
- 0900: Most new allocations for the new year are now activated.
All continuing projects are reactivated and we are now going through new projects and new users to existing projects.