Maintenance of all Pawsey systems happens on a ongoing basis. Maintenance, whether requiring an outage or not, allows Pawsey to take preventative action towards mitigating any hazards or risks that might affect the functionality of those systems and/or to upgrade the capabilities of its systems . Maintenance typically includes software/hardware updates, routine performance checks and faulty component replacements.
Pawsey schedules "systems at risk times" so that our user community can plan for outages that may be required but which have not yet been communicated, and aims to carry out any maintenace activity which may affect our user community within those times.
The current scheduled "systems at risk time" is the first Tuesday of each month, however maintenance outages will normally be comfirmed, along with an estimated timeframe , by email, the week before.
Incidents are, by their nature, unscheduled and, furthermore, service outages can arise from incidents with infrastructure beyond the Pawsey systems themselves (eg, Power, Cooling)
Maintenance and Incident wiki pages
Pawsey maintains Maintenance and Incident pages, within its public facing wiki, that seek to provide information to its user community.
Pages will contain the date of the Maintenance or Incident in a format following the ISO 8601 standard, YYYY-MM-DD
Pages are typically prefixed with either 'M-' (Maintenance) or 'I-' (Incident).
Pages may be suffixed so as to indicate the systems affected: '-All'; '-SC' (Supercomputing); '-Data'; '-Nimbus (Cloud),or '-Vis' (Visualisation),
whilst an un-suffixed Maintenance page will usually be found to contain generic details of an wider maintenance outage .
Progress updates
Pawsey staff will try to provide progress updates on these pages , as workloads allow, however when incident outages have occurred, we do ask our user community for their patience and understanding with regard to updates, as staff focus will be on getting the systems back into service while sustaining all the jobs in the queues.
For a list of Scheduled Maintenances, and of recent/ongoing Maintenances or Incidents, please see the automatically generated index below.
Note that older pages can be accessed from within the tree view visible from any individual Maintenance or Incident page.
Log | Status: | Start Date/Time (AWST): | End Date/Time (AWST): | Systems/Services Affected: | Summary: |
---|---|---|---|---|---|
M-2023-03-28-Nimbus | PLANNED | 09:00 | Nimbus | Scheduled Nimbus compute node maintenance | |
M-2023-03-20-Data | PLANNED | 08:00 | Banksia (Pawsey Offline/Cool Data Storage), MWA Archive/ASVO, Pawsey Data Portal/Mediaflux | Pawsey Tape Library Maintenance | |
I-2023-03-16-SC | RESOLVED | 08:30 | 14:04 | Setonix | Setonix being handed back to HPE |
I-2023-03-09-Nimbus | RESOLVED | 14:58 | 15:35 | Nimbus | Nimbus dashboard and authentication endpoint unavailable |
M-2023-03-07-Data | COMPLETED | 08:00 | Banksia (Pawsey Offline/Cool Data Storage), MWA Archive/ASVO, Pawsey Data Portal/Mediaflux | Pawsey - Scheduled Maintenance - Banksia | |
M-2023-03-07-SC | COMPLETE | 06:00 | 13:00 | Setonix | HPE Mandated Outage |
M-2023-02-28-SC | CANCELLED | 08:00 | Setonix | HPE Mandated Outage | |
I-2023-02-23-SC | COMPLETED | 11:30 | Setonix | Issues with the Setonix Slingshot network | |
M-2023-02-21-SC | CANCELLED | 08:00 | Setonix | HPE Mandated Outage | |
M-2023-02-21-Nimbus | COMPLETED | 09:00 | 10:20 | Nimbus | Scheduled Nimbus Maintenance |
M-2023-02-07 | CANCELLED | 08:00 | Setonix | Setonix Configuration Change | |
I-2023-02-03-Data | RESOLVED | 11:41 | 17:40 | Data Portal, Mediaflux, LiveArc | Pawsey Data Portal Service is unavailable |
I-2023-02-02-SC | RESOLVED | 09:30 | 13:24 | Setonix | Networking issue affecting compute nodes |
I-2023-01-30-SC | RESOLVED | 12:00 | 17:00 | Setonix | Known issues: "Lmod has detected the following error" |
M-2023-01-13-Building | COMPLETED | 08:00 | 13:43 | All Pawsey services will be impacted by this work, timings for recovery will depend on the progress of the underpinning building systems tasks and any maintenance to the systems which is needed: Setonix, Garrawarla, Galaxy, Topaz, Acacia (Online/Warm Data Storage), Banksia (Offline/Cool Data Storage), Askap Ingest, MWA Archive (ASVO), Pawsey Data Portal (Mediaflux), Nebula, Visualisation Lab. Nimbus (Cloud) | PAWSEY BUILDING POWER OUTAGE 13th - 19th January 2023 |