Page tree
Skip to end of metadata
Go to start of metadata

Status

COMPLETE

Points of Contact:

help@pawsey.org.au

Start Date/Time (AWST)

08:00

Responsible:Mark O'Shea
Estimated End Date/Time (AWST)

20:00

Accountable:Mark Gray
End Date/Time (AWST)

 19:00

Informed:pawsey_users@
SummaryPawsey Scheduled Maintenance
Systems/Services AffectedMagnus, Galaxy, Zeus, Topaz, Garrawarla

Description

The Pawsey Supercomputing Centre schedules Maintenances for the first Tuesday of every month, between 0800 and 2000.

Whilst not every Maintenance will require outages to services, Pawsey systems should be considered as "at risk" during those times.

Changes

  • Galaxy GPU de-commisioning: nodes within Galaxy will be re-arranged to see one cabinet just contain GPU nodes
    This will allow for power isolation, as well as seeing the a better distribution of the Aries HSN across the remaining
    nodes within the cabinets
  • Removal of all modules installed system-wide on Zeus and Magnus/Galaxy which depend directly or indirectly on Python 2
  • Singularity 3.6.4 will become default version on all Pawsey systems
  • Jobs that were in the queues on Magnus, Zeus and Topaz, prior to the shutdown will have been purged as a result
    of the new project allocations having been applied.

Updates

  • 0845: Crays are powered down: maintenance work commences
  • 1030: Two nodes in Galaxy and one in Magnus have dislayed errors at they powered up
    Our on-site Cray Engineer is investigating the issues
  • 1100: The Slurm database upgrade is progressing: this will facilitate the entry of the
    project allocations for the new year
  • 1540: The CLE portion of Magnus has rebooted: acceptance testing is in progress
  • 1630: Topaz has passed acceptance tests
    The CLE portion of Galaxy has rebooted: acceptance testing is in progress
  • 1800: All systems are rebooted, however we are investigating some issues with the
    new allocations, and a couple of acceptance test,  before retuning resources to service
  • 1845: Galaxy returned to service
  • 1845: Magnus, Topaz and Zeus are all ready to return.
    New allocations for the year are being applied, so users may not be able to submit jobs
    until their accounts have been allocated (will run through the night).
  • 1845: Garrawarla returned to service.
  • Wednesday 6th

  • 0900: Most new allocations for the new year are now activated.
    All continuing projects are reactivated and we are now going through new projects and new users to existing projects.

Post-Maintenance Summary: