Page tree
Skip to end of metadata
Go to start of metadata

Status:

COMPLETED

Points of Contact:

 help@pawsey.org.au 

Start Date/Time (AWST):

 

Responsible:

 Ashley 

Estimated End Date/Time (AWST):

 

Accountable:

 Ashley 

End Date/Time (AWST):

@ 9:50am

Informed:

Summary:

Access to lustre "/scratch" filesystem

Systems/Services Affected:

Magnus, Zeus & Topaz (Any cluster that has access to /scratch)

Description:

  • Certain Storage volumes or OSTs were unavailable pertaining to /scratch
    • error: check 'snx11038-OST006c-osc-ffff96ed4b3a2000': Resource temporarily unavailable (11)
      error: check 'snx11038-OST006d-osc-ffff96ed4b3a2000': Resource temporarily unavailable (11)
      error: check 'snx11038-OST006e-osc-ffff96ed4b3a2000': Resource temporarily unavailable (11)
      error: check 'snx11038-OST006f-osc-ffff96ed4b3a2000': Resource temporarily unavailable (11)
  • From the looks of it, incomplete failover from HA storage pair around "01:35:59" on 7th November
    • Manual fail over was done Crap/HPE engineer
  • Addressed as of  9:50am

Changes:

  • HPE / Cray Engineer
    • 3 Drives were replaced
    • Failover was performed

Updates:

Further information: