Page tree
Skip to end of metadata
Go to start of metadata

Status

COMPLETED

Points of Contact:

help@pawsey.org.au
Start Date/Time (AWST)

10:00

Responsible:

Mark Gray: Head of Platforms
Estimated End Date/Time (AWST)

14:00

Accountable:

Mark O'Shea: Head of  Supercomputing Operations
End Date/Time (AWST)

14:25

Informed:

Mailman: galaxy_users
Summary:Loss of cooling to cabinets
Systems/Services AffectedGalaxy


Updates:

  • 09:45 Ongoing re-balancing work of the water cooling systems across the Pawsey data-centre has seen one cabinet in Galaxy shutdown
  • 09:50 On-site Cray engineer informs us that he will be shutting down Galaxy to avoid issues with other cabinets
  • 10:00 Commencement of shutdown of Galaxy 
  • 10:45 Our on-site Cray engineer is inspecting Galaxy. in respect of its water-cooling, ahead of a return-to-service
  • 11:55 Our on-site Cray engineer is going to reboot Galaxy so as to ascertain whether it is capable of operating
    against the existing state of the water cooling to the cabinets. If it is, we'll return it it service asap, but if it isn't,
    we may need to have CBIS re-address the re-balancing work that was carried out this morning.
  • 12:55 Jobs have been resumed on Galaxy (Jobs are at risk due to re-balancing)  
  • 14:25 Jobs appear to  have been running as normal.


Post-Incident Summary: