Page tree
Skip to end of metadata
Go to start of metadata

Status:

 COMPLETED   

Points of Contact:

 help@pawsey.org.au 

Start Date/Time (AWST):

08:00

Responsible:

 Ashley Chew

Estimated End Date/Time (AWST):

17:00

Accountable:

 Ashley Chew

End Date/Time (AWST):

15:00

Informed:

 garrawarla_users 

Summary:

Astro Lustre Filesystem (offline)

Systems/Services Affected:

Garrawarla, Datamovers Node (Setonix)

Description:

Lustre Filesystem astro requires a brief outage to address one of the storage controllers where the UPS battery needs to replaced

Changes:

  • Garrawarla will taken offline so "/astro" can be unmounted
    • Backend storage systems will be taken offline
    • Astro Array1b controller be replaced with a new battery
  • Setonix
    • "/astro" will be unmounted from the data mover nodes on Setonix

Updates:

  • 8:10 am Astro Filesystem was unmounted from Garrawarla and Setonix
  • 8:30am Shutdown of Astro Filesystem Frontend Nodes
  • 8:50 am Shutdown of Physical Storage backend to allow replacement of 1xUPS batter on a Controller
  • 9:15 am Vendor Engineer is replacing the UPS battery
  • 9:30 am Batter has been replaced in one storage controller, bringing back the storage unit
  • 9:50 am Physical Storage controller is back but there is an inconsistency
    • New UPS Battey MFD (Manufacture Date of the Battery) is inconsistent with whats been reported in controller where it's marking the UPS battery as being needing replacement
  • 10:30 Onsite Engineer is working with Vendor Storage Specialist
    • Addressing MFD UPS part
  • 12:30 System controller now accepts the replaced part
    • Unit need to be charged up to 85% prior to use
  • 13:40 Restoring Frontend Storage nodes so Lustre can be presented
  • 14:05 Filesystem restored
    • Remounting File System on Garrawarla
  • 14:55 Garrawarla has been released for general use
    • "/astro" to be mounted on the data mover nodes on Setonix is pending on restoration