Maintenance of all Pawsey systems happens on a ongoing basis. Maintenance, whether requiring an outage or not, allows Pawsey to take preventative action towards mitigating any hazards or risks that might affect the functionality of those systems and/or to upgrade the capabilities of its systems . Maintenance typically includes software/hardware updates, routine performance checks and faulty component replacements.
Pawsey schedules "systems at risk times" so that our user community can plan for outages that may be required but which have not yet been communicated, and aims to carry out any maintenace activity which may affect our user community within those times.
The current scheduled "systems at risk time" is the first Tuesday of each month, however maintenance outages will normally be comfirmed, along with an estimated timeframe , by email, the week before.
Incidents are, by their nature, unscheduled and, furthermore, service outages can arise from incidents with infrastructure beyond the Pawsey systems themselves (eg, Power, Cooling)
Maintenance and Incident wiki pages
Pawsey maintains Maintenance and Incident pages, within its public facing wiki, that seek to provide information to its user community.
Pages will contain the date of the Maintenance or Incident in a format following the ISO 8601 standard, YYYY-MM-DD
Pages are typically prefixed with either 'M-' (Maintenance) or 'I-' (Incident).
Pages may be suffixed so as to indicate the systems affected: '-All'; '-SC' (Supercomputing); '-Data'; '-Nimbus (Cloud),or '-Vis' (Visualisation),
whilst an un-suffixed Maintenance page will usually be found to contain generic details of an wider maintenance outage .
Pawsey staff will try to provide progress updates on these pages , as workloads allow, however when incident outages have occurred, we do ask our user community for their patience and understanding with regard to updates, as staff focus will be on getting the systems back into service while sustaining all the jobs in the queues.
For a list of Scheduled Maintenances, and of recent/ongoing Maintenances or Incidents, please see the automatically generated index below.
Note that older pages can be accessed from within the tree view visible from any individual Maintenance or Incident page.
|Log||Status:||Start Date/Time (AWST):||End Date/Time (AWST):||Systems/Services Affected:||Summary:|
|Banksia (Pawsey Offline/Cool Data Storage), MWA Archive/ASVO, Pawsey Data Portal/Mediaflux||Pawsey - Scheduled Maintenance - Banksia|
|Magnus, Zeus & Topaz (Any cluster that has access to /scratch)||Access to lustre "/scratch" filesystem|
|Nebula, Topaz Remote Vis, Reservation System||Pawsey Scheduled Maintenance|
|Setonix, Magnus, Garrawarla, Galaxy, Zeus, Topaz,||Pawsey Scheduled Maintenance|
|Data: Banksia, MWA Archive/ASVO, Pawsey Data Portal/Mediaflux.|
SCOps: Slurm backend upgrade; Filesystem firmware upgrades
|Pawsey Scheduled Maintenance|
|Banksia (Pawsey Offline/Cool Data Storage), MWA Archive/ASVO, Pawsey Data Portal/Mediaflux.||Pawsey - Scheduled Maintenance - Banksia|
|Any Cluster system with /askapbuffer mounted on ie Galaxy, Zeus (copy nodes)||Filesystem Lustre "askapbuffer" - OST001D readonly|
|Banksia (Pawsey Offline/Cool Data Storage), MWA Archive/ASVO, Pawsey Data Portal/Mediaflux.||Banksia S3 and https services unavailable (RESTAPI).|
|Banksia (Pawsey Offline/Cool Data Storage), MWA Archive/ASVO, Pawsey Data Portal/Mediaflux.||Pawsey Scheduled Maintenance - Banksia|
|Data: Banksia, MWA Archive/ASVO, Pawsey Data Portal/Mediaflux.||Pawsey Scheduled Maintenance|
|Setonix||Extended outage to Setonix|