Page tree
Skip to end of metadata
Go to start of metadata

Status

COMPLETED

Start Date/Time (AWST)

 07:00

Estimated End Date/Time (AWST)
End Date/Time (AWST)

  11:40

Summary:Pawsey HSM Unscheduled Outage
Systems/Services AffectedDMF, Mediaflux, Data Portal, NGAS, CASDA


Points of Contact

  • help@pawsey.org.au


Responsible:

  • Paul Newman


Accountable

  • Ugo Varetto


Informed:

Updates:

  • 07:30 Cluster being worked on, checked and restarting nodes.
  • 08:30 Most services up, checking.
  • 09:30 All nodes powered off.  Full cluster restart.
  • 11:40 All services checked. Outage resolved.
  • 11:41 email to all  users sent out.


Post-Incident Summary:


  • The MDS (Meta Data Server) became unresponsive, but not sufficiently so to trigger an HA fail over.
  • The root cause is yet to be determined.