Page tree
Skip to end of metadata
Go to start of metadata

Status:

 COMPLETED   

Points of Contact:

 help@pawsey.org.au 

Start Date/Time (AWST):

13:00

Responsible:

CSIRO Business & Infrastructure Services

Estimated End Date/Time (AWST):

17:00

Accountable:

 Brad Evans 

End Date/Time (AWST):


Informed:

 pawsey_users, pawsey_friends, pawsey_partners 

Summary:

PAWSEY BUILDING POWER OUTAGE

1st - 4th November 2021

Systems/Services Affected:

All Pawsey services will be impacted by this work, timings for recovery will depend on the progress of the underpinning building systems tasks and any maintenance to the systems which is needed. (Magnus, Galaxy, Zeus, Topaz, Data Portal, Mediaflux, DMF, NGAS, CASDA, RDS Storage, Nimbus, LiveArc)

Description:

At Pawsey, we have been working to get Setonix, your new research supercomputer, ready and available for your 2022 allocation. As part of the Setonix commissioning, some electrical works are needed and therefore an extended power outage has been scheduled for the first week in November.
 
Building work undertaken during the past few months will also benefit from this outage, allowing us to complete several system changes required for the implementation of the Pawsey refresh technologies.

Updates:

  • November 1st
    • 13:00 All running jobs on Magnus and Galaxy have now completed
      Our on-site Cray engineer is commencing the shutdown of these resources
    • 13:45 New logins to Magnus and Galaxy login nodes have been disabled: users have been advised to logoff.
    • 14:00 Magnus and Galaxy login nodes have been shutdown
    • 15:00 The Cray hardware hosting the /scratch filesystem has been shutdown
  • November 3rd
    • 14:00 Nimbus cloud is live.
  • November 4th
    • 12:50 Our on-site Cray engineer is commencing the restart of Magnus and Galaxy
    • 14:40 We are commencing the acceptance tests on Magnus and Galaxy
    • 15:45 We have started to release queued jobs back onto Galaxy
      eLogin node access will be resumed shortly
    • 15:55 We have started to release queued jobs back onto Magnus
      eLogin node access will be resumed shortly
    • 16:10 Galaxy has been returned to service
    • 16:15 Magnus has been returned to service


Further information:

Individual operational units within Pawsey may provide information, as to the work they are carrying out, via their own pages.

Such pages, if created, will be linked to from here:


What is happening?

 
The extended outage will occur during the week of 1st to 4th November 2021, in line with our planned November maintenance. We will keep you up to date using the usual channels, which include our status page, social media, and emails, as systems come back on-line.  
 
Some services (listed in the table below) will be offline for periods during the nominated window and that may impact other services that rely on them. 

Why is this happening?

 
The next few years will see Pawsey deliver on the Federal and State investment in Australian HPC capacity and build on Australia’s emerging role in supercomputing and large-scale data research.
 
This new paradigm will only be achieved with your support. After this outage, Australian researchers will be one step closer to accessing Setonix.

What do I need to do?

 
Please take any action needed to work around the outage and share the information with your team so they can plan their work appropriately.

The Maintenance and Incidents web pages are located  here, these will have individual technology focussed pages relating to the different technical aspects of the work and these will be updated as information comes to hand or as the changes take place.

We appreciate the support and understanding of users during this exciting period but recognise this might cause some issues for some users. If you have any questions, please e-mail help@pawsey.org.au 

Dates: Monday 1 November 2021 to Thursday 4 November 2021
Services impacted (with planned timings):

DateTime Impact
Monday 1 NovemberStarting 1300Preparatory shutdown of all Pawsey Services
  • 1300: HSM and associated systems powering down.
  • 1500: Nimbus powering down.
  • 1700: All Pawsey Services offline.
Tuesday 2 November0700 – 1300The entire Pawsey site will be shut down during this window while critical power works are undertaken.
Tuesday 2 November1300 - 1700Expect to begin restoration of the following services:
  • ASKAP Buffer
  • ASKAP Ingest
  • Astro Filesystem
  • Zeus, Topaz, Garrawarla Login Nodes, SMW
  • Magnus Scratch Filesystem
  • DMF Infrastructure
  • Banksia
  • Networks
  • Nimbus
  • Internal KVM Services Infrastructure
Wednesday 3 November0900 - 1700
  • 0900: Nimbus available.
  • 1300: HSM & associated systems available. 
Pawsey staff will begin to restore Magnus, Galaxy, Garrawarla.
Thursday 4 November
All Pawsey services are expected to be available by 5pm.