Page tree
Skip to end of metadata
Go to start of metadata

Pawsey User Forum (Perth), Pawsey Supercomputing Centre, 26th October 2018

I would like to see more status monitoring tools for the jobs that we run on Magnus/Galaxy - understanding the state of the system, and where the pinch points are in our workflows is essential, even during regular operation (eg not in development).

We have historically had a larger ecosystem of monitoring tools for our previous supercomputers Epic and Fornax, such as Ganglia, but it was not straightforward to reuse the same tools on Magnus and Galaxy.

Internally, we are using tools such as Grafana and XDMOD but it would be useful to make more information easily available to end users.

We'd like to get to a point where researchers can click on jobs in a dashboard to find out information about the job such as runtimes, memory consumption, and file I/O.

Unfortunately, significant effort is required to setup, configure, and make this information available in this manner.

Furthermore, a lot of this information is already available to users via the standard SLURM commands, although not necessarily in an easy to interpret format.

We will be starting a project next month to look at this in more detail, and the feedback that this would be useful is appreciated.

I had jobs on Zeus that were encountering issues on two of the nodes, it would be good if nodes with problems could be removed from the scheduling before jobs are run on them.

In between jobs, all nodes run a health check script to try to identify nodes that have problems and take them offline for further investigation.

These health scripts have grown organically over time as we encounter new types of issues, develop tests to automatically identify them, and add them to the health check script.

However, new issues continue to occur over time as new workflows start running on the system or updated software versions change behavior.

Please report node issues so that we can develop further tests for the health check script.

It would be useful if the Pawsey application portal for Magnus and Zeus was updated to allow copying from previous applications.

We intend to update the application portal in the near future.

Our focus recently has been the Origin portal for project leaders to manage their project teams.

This has been a great success, reducing the effort required by both researchers and Pawsey staff for project administration.

It would be useful if applicants could pull in relevant publication and citation information from online sources such as ORCID and ResearcherID.

This is useful feedback and we will investigate what would be involved.

Would it be possible for project leaders to be able to remove researchers from projects in Origin?

The release of Origin 1.6 will include this feature, and the researcher will be removed from the project and any files in the project's /group or /scratch space will have their ownership transferred to the project leader.

The /scratch filesystem is still really slow at times; in one instance my group could not retrieve data for 3 days.

One contributing factor is the /scratch health checker, which used to run quickly but can now take up to 5 days to complete, while it puts additional load on the metadata server.

We are trialing starting it running on Saturday in the hope that it will run more efficiently over the weekend.

Ultimately the underlying issue is the huge number of files on scratch. Many groups are using workflows with millions of tiny files that are not well suited to HPC filesystems.

We will be implementing a 1 million file per project quota for the number of files on the /scratch filesystem to address this issue at the start of next year.

If you are developing scientific code, please consider using HPC-friendly standard file formats such as HDF5, rather than using individual files for each timestep or variable.

Are there still issues with OpenFOAM workflows affecting /scratch?

We are continuing to work with OpenFOAM users to ensure their workflows are configured to cause minimal load on the filesystem.

Would it be possible to have a separate filesystem for these workflows to reduce their impact on other users?

The /scratch filesystem will be replaced in the upcoming capital refresh, and the configuration of its replacement(s) will be carefully considered.

We will have processes for user consultation for the capital refresh, which are currently being determined.

I am experiencing a lot of issues with the Nimbus GPU nodes, which are dropping connectivity on a daily basis and requiring a manual reset.

We often encounter issues such as these during the early adopter periods for new facilities, and we expect to provide system stability at the end of the early adopter period.  

The Nimbus team is investigating the possibility that hardware issues are causing particular GPU nodes to exhibit unstable behaviour.

Please make sure we are informed of any issues you are encountering, and our staff will continue to work to address these problems.

Will there be a call for GPU projects in 2019?

We expect to be replacing the older GPU nodes sometime in 2019, which should improve the available GPU resources.

A project to investigate the configuration of Zeus for the following year has commenced, and we expect some changes to the queues and accounting.

Ideally, we intend to reach a point where projects apply for a single allocation at Pawsey, which can then be used for the appropriate technology in processing each step of a workflow.

What is Pawsey's plan for implementing bioinformatics pipelines?

We are investigating the various ways that users can connect to our systems to submit jobs.

This includes interactive desktops via FastX, Jupyter notebooks, and workflow management tools.

As regards the latter, we will be pilot testing a number of solutions in the next few months, including but not restricted to Galaxy, Kepler, Nextflow, and Cromwell.

If you have feedback regarding the implementation of such pipelines, please get in touch via our User Support Portal.

Are there any plans for Pawsey providing scientific domain expertise?

Many of our uptake staff have studied and worked in various disciplines, and have PhD qualifications, postdoctoral experience, and modest publication track records.

However, the intention is for our staff to assist researchers with moving workflows to appropriate HPC computing facilities, providing advice and in some instances parallelising or accelerating software.

Pawsey emphasises the transfer of HPC knowledge from our staff to the over 1800 researchers using our services, as this is a more scalable and effective use of a limited number of staff.

Carrying out the scientific research should remain the focus and responsibility of the project groups that apply for and receive allocations on our systems, rather than Pawsey staff.

What else has been happening recently at Pawsey?

There are eight new data mover nodes that have recently been delivered, four of which will replace the current copyq nodes and four which will be available externally via the hostname.

Previously both methods of data transfer were handled by the same nodes, but we have decided to split them out as demand has increased.

For data transfer to /group and /scratch, please use the copyq partition on Zeus or the hostname.

Running file transfer programs such as rsync and scp on the login node or to or should be avoided, as this puts uneccessary load on the login nodes.

We also have three new staff who will be triaging requests in our user support portal, which we hope will improve responsiveness to service desk requests.

Pawsey User Forum (Brisbane), University of Queensland, 27th September 2018

The CLE6 upgrade on Magnus caused significant delays for my group. While we've seen the discussion of the upgrade from previous User Forums and understand why it was necessary, we are concerned that we have not been able to use all of our allocation for 2018. Will this disadvantage my group when applying for time for 2019?

There are many reasons that a project group may not fully use their allocation. In addition to major changes affecting workflows, other examples include project members moving to other jobs or taking parental leave.

Merit allocation committees do consider usage of previous allocations as one aspect of the review process, and are understanding of legitimate reasons for low utilisation.

If you are in such a situation, it is critical that a justification is included as part of the merit application.

Would it be possible to request allocations on Zeus directly as part of the merit allocation process?

It is not possible in the current merit allocation call to request time directly on Zeus.

In terms of available time, Zeus is significantly smaller than Magnus which makes it difficult to provide substantial allocations through merit processes.

Over time Zeus has grown from its original purpose as a pre/post-processing and visualisation cluster to also include large memory, long wall-time, gnu-accelerated, and throughput workflows that were less suited to Magnus.

Ideally we would like to provide our supercomputing projects with one allocation that can be used across our systems.

However, we currently run separate instances of the SLURM scheduler for Magnus, Zeus, and Galaxy. Moving to a single scheduler would require a large investment of operational staff effort, and also cause significant interruption and change to the workflows for all users.

For this reason we will be continuing to run these systems as is, but will be certainly looking to improve our configuration for the upcoming capital refresh.

In the meantime we are reviewing the configuration of Zeus, to see what can be done to improve the allocation process.

Are there plans for more GPU nodes on Zeus?

As mentioned at the previous User Forum, we are planning to expand the number of GPU nodes in Zeus, possibly sometime in the first half of 2019.

Do we really need a 24 hour limit on job wall-times?

There are several reasons we limit job times.

While there are many types of workflows on the system, one of the empowering aspects of supercomputing is to reduce the time it takes a simulation to run from weeks or months to hours or days.

By limiting the wall-time to 24 hours, it provides opportunities for new jobs to start running and allows a responsive queue for groups that are managing their allocations well.  This is especially important for very large jobs, that need to wait for a large number of nodes to become available.

Another reason is that on large scale supercomputing systems, the occurrence of individual node failures is relatively common. If that node was part of a large job that had been running for days, a significant amount of compute time would be lost.

The 24 hour limit encourages the use of appropriate checkpoint and restarting from intermediary results, which limits the amount of compute time that is at risk of loss due to node failure.

If there are once-off cases of large jobs that require slightly more than 24 hours, contact our help desk to discuss if an exception could be made.

For smaller jobs, we have provided a long queue on Zeus that allows wall-times of up to 4 days in response to feedback via our User Forums.

Other groups have used more than their allocation, but I am waiting in the queue. How does the scheduler prioritise groups that are over their allocation?

The scheduler takes into account a number of factors when determining the priority of a job, including how much allocation has been used, the size of the job, and how long it has been waiting in the queue.

We provide a significant quality of service priority boost to projects that have allocation remaining, compared to those that have consumed more than their allocation.

This increase in priority is significantly more than the other factors, such that the scheduler should always prioritise jobs from groups that have allocations remaining.

However, there are situations where there are no jobs to run from groups with remaining allocation, and it is a better outcome for other groups to be able to use time rather than the system to remain idle.

Examples include overnight and on weekends when there are less jobs being submitted to the queue; and also small, quick jobs that may fit into gaps left by nodes waiting for other nodes to become available ahead of launching a large parallel job.

Towards the end of the allocation quarter, the number of projects with remaining allocation diminishes and more out of allocation jobs tend to have opportunities to run.

The best strategy for making the good use of an allocation is to always have jobs in the queue, from early in the allocation quarter.

More details can be found on the Understanding Your Allocation page in our user documentation.

Can I get a priority boost for an urgent deadline?

Priority boosts may be provided for once off cases where a job needs to be completed for an urgent reason, such as a paper submission deadline.

Such requests can be submitted as a ticket via our User Support Portal, see our Extraordinary Resource Requests page in our user documentation.

Can I apply for reservations so I can run a course?

If you are providing domain specific training on our systems, we would very much like to discuss with you in more detail.

We can potentially provide staff to deliver complementing sessions around the use of our facilities, assist with advertising, in addition to providing modest compute resources to support such activities.

Please contact us directly or via our User Support Portal.

What should I do to prepare for a merit application?

It is strongly recommended to get access to our systems via the Director Share scheme ahead of a merit application, and install and benchmark your workflow.

This activity allows an informed case for an allocation to be included in your merit application, rather than an estimate of what is required.

Our staff also have a lot of experience with the application process, and would be happy to provide feedback on applications to ensure no information is missing.

It is best to contact us well ahead of application deadlines, to allow time for our staff to provide feedback.

Refer to our Writing a Strong Competitive Merit Application page in our user documentation for more details.

What resources are available to support the development of HPC-ready software?

In terms of computational resources, the Director's Share scheme provides modest allocations for development activities on Zeus and Magnus.

Additionally, the Nimbus cloud can also be a useful facility for developing software workflows.

We also run calls for Pawsey Uptake Projects typically once per year, which provide Pawsey uptake staff effort to collaborate on various types of projects including development of HPC-ready software.

Is it better to develop GPU codes on Nimbus or Zeus?

This probably depends on the type of development activities that are needed.

Nimbus will suit code development as your instance is always available to connect to.

Zeus is better suited to profiling, debugging, and scaling tests of multi-node or multi-GPU codes.

I encountered an error which I reported via a ticket. The response from Pawsey staff indicated that the issue was fixed, and the ticket was closed. However, I encountered the issue again.

If this occurs, please reopen the ticket. The HPC facilities operated by our staff are incredibly complex, and it is important that we know if an issue has been misdiagnosed.

Please be understanding that sometimes issues can take some time to diagnose and fix.

This can be particularly true if the fix will require users to make changes to their scripts and workflows.

In some cases we may continue operating with workarounds for some issues, while we wait for an appropriate maintenance day or end of year shutdown to implement measures to limit the impact on the user base as a whole.

It would be useful if the copyq was accessible from Magnus.

As discussed at the start of the session, we would prefer for all of the SLURM partitions to be accessible from one cluster, but this is currently not something we will support before the end of life for Magnus.

However, it is possible to submit jobs from Magnus to Zeus using the cluster flags to sbatch and visa versa.

It is also possible to automate data staging by having the last command of a job script submit the next job to the appropriate SLURM cluster.

For example, a job on Zeus that stages data could submit a job to Magnus to process the data, that could then submit a job to Zeus to perform post-processing of that data.

See our documentation page on Data workflows for more information and an example.

The 30 day purge policy on /scratch only deletes some old files, and not other older files.

The Scratch Purge Policy is based on the time of last access, rather than the creation or last modification time of the file.

The purge policy is implemented by a server that scans the /scratch filesystem and deletes files as it finds them.

Given the millions of files on /scratch, this is a time-consuming task that puts additional load on the filesystem.

If you have older files on /scratch, please delete them as this will increase the availability and performance of the filesystem for everyone.

I used my allocation early in the quarter, and I am concerned my files will be deleted before my job eventually runs.

Ideally, an allocation should be used throughout the allocation quarter.

Depending on the workflow, particularly if the job inputs are small, it can be possible to check for the presence of files at the start of a job and copy them in if needed.

For such jobs, it can also be good practice to have jobs automatically set up simulation directories per job for better data management.

For jobs with relatively small input parameter files, these files can be stored in /group and copied to /scratch at the start of a job.

Can we invite international collaborators to our projects, and include them in our merit applications?

In the vast majority of cases, international collaborators can be added to projects.

Use of Pawsey Project Infrastructure is conditional on complying with relevant laws and export controls.

This includes the Australian Defence Trade Controls ActUnited National Security Council (UNSC) sanctions regimes and the Australian autonomous sanctions regimes, and US Export Controls.

If you have any questions or concerns about any of the above, please contact us via our User Support Portal.

You should include international collaborators in your applications, as this plays a part in demonstrating the impact of your work.

We are using git to distribute the source code for our research group for the HPC jobs, should we be using modules?

The advantage of modules is that project members will not have to each recompile the code.

It also leads to more consistent job scripts that can more easily be shared between project members, as they will be using the same software installations via the modules.

Some codes do require per-job compilation, for which modules are not suitable.

If you are developing a such a code, consider if configurations can be moved outside the compiled source to improve the usability of the code.

We have a build tool called Maali that we use to install software and associated system modules.

Maali uses recipes known as cygnet files to compile codes for the CPU architectures and compilers available on our systems.

For more information, refer to the Maali pages in our user support documentation.

We would like to provide a vote of thanks for the Pawsey support team, their responses have been detailed, timely and correct. We have much appreciation for the support team working hard behind the scenes!

Operating and providing support for the Pawsey facilities is at times complex and challenging.

We really appreciate the understanding of our users, and their willingness to collaborate constructively to accomplish world-class science using our facility.  

Our thanks in particular to our attendees at our user forums, as time is a precious commodity in research fields.

These feedback sessions have provided valuable insight both for our staff to improve aspects of our facilities and for our users to understand some of the reasons behind our decisions.

We particularly look forward to our next User Forum in October, which will be physically held at Pawsey and available to remote attendees online.

Pawsey User Forum (Adelaide), University of Adelaide, 4th September 2018

Having to switch between different schedulers on institutional and national HPC resources can be frustrating. However, once the initial effort of learning a new interface is complete the ongoing effort is not so significant.

We have in the past provided translation scripts, particularly when we first moved to SLURM. While these scripts can be helpful in rapidly moving a workflow to a new scheduling environment, in the longer term it tends to be more efficient to write scripts directly for the scheduler on a particular system.

It would be useful if there was a standard API that could interface with the various schedulers. However, there would still be challenges given different systems have different configurations and feature sets enabled, even if they use the same scheduler.

The differences in workflows also extends beyond scheduling. For example, the project names, filesystem paths, and the names of partitions or queues also vary between centres.

Our documentation has details on how to migrate workflows from PBS to SLURM.

As a PhD student I did not have sufficient access to compute resources which increased the length of my candidature. Unfortunately, I was not aware of the facilities available at Pawsey.

We strive to connect with researchers through various channels to ensure they are aware of our resources. This includes university DVCR offices, institutional mailing lists, road show events, research weeks, and conference booths. One of the most effective and valuable methods is word of mouth, so please mention us to colleagues that may be in need of our facilities.

To keep up to date with our events and announcements, you can sign up to our Pawsey Friends mailing list or watch our events page.

Following the recent Athena Early Adopter call, what were the results of the technology investigation?

Athena (also known as the Advanced Technology Cluster) consisted of two technologies for investigation. They were selected via an Expression of Interest process operated by the Pawsey Uptake Strategy Group, a panel of computational scientists drawn from Pawsey users and the Australian computational research community.

One of the technologies of interest was the Knights Landing generation Xeon Phi processor. While groups in the earlier adopter process found it easy to get up and running on this architecture, there was not sufficient effort available to optimise workflows and in most cases performance did not surpass traditional CPU-based nodes on Magnus. This architecture has since been discontinued by Intel, and is unlikely to feature in future Pawsey procurements.

The other technology of interest was the Pascal generation GPUs. While not all groups had algorithms or codes that could take advantage of GPU acceleration, those that did saw significant performance gains compared to traditional CPU-based nodes on Magnus. Pawsey has had modest GPU-enabled facilities for a number of years, and currently plan to continue supporting these work flows.

Since the end of the early adopter process, Athena has been integrated into the Zeus cluster and are available for suitable workflows. The Xeon Phi nodes are in the knlq and knlq-dev partition, and the Pascal nodes are in the gpuq and gpuq-dev partitions. Refer to our documentation for more details.

Are there any plans for more GPUs?

Our Nimbus cloud facility has recently brought online 12 NVIDIA Volta GPUs to enable GPU processing on cloud-based workflows. These nodes are currently being tested by early adopters, and will be more widely available later in the year. Contact our help desk if you are interested in using these GPU nodes on Nimbus.

We also are planning to refresh the GPU nodes in Zeus with newer generation GPUs, and retire the Kepler nodes at some point in the future.

Some of my GPU jobs do not scale beyond a single GPU on a node, but the Pascal nodes on Zeus do not allow multiple jobs to run on each of the four GPUs.

We will investigate whether we can implement node sharing for the Pascal GPU nodes on Zeus.

Code versioning was mentioned at a previous user forum. Ideally, we would like to compile and use the same code for several years to limit the impact that dealing with issues arising from changes has on our research. However, with several weeks notice we can rebuild if necessary.

There are a number of factors that are considered when we look at the schedule for newer versions of system modules. Newer versions of the system software stack can provide better stability and performance, and also may be needed for more recently developed user code. Relatively recent updates are also needed for timely support from the system vendors, which in turn allows us to provide a better service.

We look to limit operating system upgrades to be no more frequent than once per year. For Magnus, we are looking to update the Cray software stack approximately once every three to six months, unless there is an exceptional need. For our other system modules, we look to retire older modules once there are several versions available to reduce clutter in the module avail command and discourage users from utilising older and possibly less efficient software.

How can we provide input into Pawsey's capital refresh process?

Pawsey has recently received funding from the federal government to refresh our computational infrastructure, for more information refer to our announcement. We expect the process will include strong engagement with our user community for input, which will be critical to enable Pawsey to support the growth of Australian computational research into the future.

Pawsey User Forum (Sydney), University of Sydney, 10th July 2018

Some of our workflows used licensed software that needs to connect back to our license servers, and in some cases have limitations regarding the physical distance.

Many groups make use of licensed software at Pawsey that connects back to license servers at the user's institution. This may require relevant staff at the institution to allow external connections from Pawsey systems through the firewall. It is important to understand the terms of the license, and in cases where limitations do not allow the workflow to run on Pawsey systems, it can be worth approaching the software vendor or local reseller to discuss a solution.

Pawsey has obtained licenses of various software products for code development, compilation, debugging, and profiling. However, we do not purchase licenses for domain-specific research codes.

When should a research group make use of Pawsey facilities, instead of institutional clusters and storage?

Institutional facilities are a crucial part of Australia's research infrastructure, and should be the first contact for groups as they start working with computational workflows in their research. Depending on the computational scale of research, these resources may be the most convenient and appropriate for a group.

Researchers and institutional facility staff should consider Pawsey as workflows outgrow the available institutional resources. It may be that a group requires access to a larger number of cores, memory footprint, or high-performance storage. Moving larger users of institutional resources that are appropriate for Pawsey facilities may also relieve pressure on institutional facilities for other research that may be equally as important but better suited to local resources.

What are the best first steps for getting access to Pawsey facilities?

Researchers that are interested in making use of Pawsey facilities should feel free to contact us via the Pawsey Service Desk. Our staff are happy to discuss the computational requirements of a group, and provide guidance through our application processes. More details are available on our documentation web pages.

For large HPC allocations, there are annual calls towards the end of the year for allocations for the following calendar year. For a successful application, it is strongly recommended that research groups first apply for a modest director's share allocation to install and benchmark their workflow, in order to provide relevant details in their competitive merit allocation.

What can be done to make workflows more portable between different systems, both at Pawsey and other centres?

We are currently investigating the use of containers to improve the portability of workflows. Containers provide a consistent software environment that can be used on different systems. Currently at Pawsey, Docker can be used on Nimbus and Shifter allows the use of Docker containers on Zeus and Magnus.

In addition to providing portability, with some additional effort the use of containers can also provide significant performance improvements for some codes. Specifically, this includes workflows where there is a significant number of I/O transactions, such as Python codes loading modules or OpenFOAM runs writing many small files. The use of temporary filesystems within the container significantly reduces the load on the underlying parallel file system by presenting as a single large file for the container.

As this is a new technology for the centre, we are still working on our documentation and approach to support for containers. We would appreciate further feedback from our users, as we continue to roll out this capability.

Pawsey User Forum (Melbourne), C3DIS Conference, 31st May 2018

Changes to the operating system on HPC systems, such as the recent CLE6 upgrade on Magnus, can cause significant delays to researchers due to re-compilation and debugging any errors that arise.

Major upgrades to the operating system are not decisions that are made lightly, and it is the preference of Pawsey staff to use the same base operating system for the lifetime of an HPC system and only apply relevant patches for security and system stability.

For the CLE6 upgrade in particular, the lifespan of Magnus was significantly longer than anticipated and the operating system was sufficiently old that core software packages were out of date (for example OpenSSL no longer met acceptable security standards).

The upgrade took a week to implement with significant effort from Pawsey staff, who also avoided taking leave following the upgrade to ensure there was adequate support available for users.

There are no further major operating system upgrades expected before Magnus is replaced.

Following the User Forum, we re-examined the information provided for the upgrade. While there were several informative emails sent months and days ahead of the upgrade, the maintenance page on our documentation portal was less informative than it could have been.

We will endeavour to provide more information and justification of major changes on our maintenance pages in the future.

It is important for some research codes to be compiled with the same compiler version to continue check-pointed simulations, can older compiler versions remain available?

On systems where this aspect of the software stack is provided by Pawsey staff, we have in the past kept only the last couple of versions available to keep the total number of system modules that we are supporting from growing too large.

On systems such as Galaxy and Magnus, compiler versions are provided by the system vendor and are outside the immediate control of Pawsey staff, which is something we will be considering for future systems.

If a particular version of GCC is critical to your research work flow on Zeus, it is also possible to install it as a project module in your /group directory where it will not be removed.

The /group filesystem is not large enough for my data, and I would prefer to store data at Pawsey rather than my institution so it can be easily retrieved for future processing.

Project leaders of merit allocations can enable up to 10 TB of /project storage using the Pawsey's Origin user portal.

Applications for more significant storage can be submitted via the data application portal.

We would like help developing our codes, is there a way to access Pawsey staff for assistance?

There are regular calls for Pawsey Uptake Projects which provide part-time staff assistance for several months, awarded to competitively selected projects.

Based on experience over many of these calls, the success of these projects is dependent on participation and transfer of knowledge to staff within the project group.

Longer term projects are best supported by a code developer working as part of a project team.

It is difficult to get funding for code development positions which produce code rather than publications, and no long term career paths in research for such work.

This issue seems to be acknowledged widely across a number of disciplines and institutions.

The Federal Government's Research Infrastructure Investment Plan recognises the importance of a skilled workforce and highlights the need to work with research organisations and universities to continue to develop other trained technicians and career pathways for researchers.

The Pawsey researcher presentations at Pawsey Roadshows and Fridays have been of interest, but I haven't been able to attend in person. Could these events be streamed?

This is certainly something we could investigate. For this kind of content, it may be best recorded so users can view at a time that is convenient.

We are also in the process of developing video content for our documentation pages, and will be looking to provide streamed developer courses in the near future.

I would be interested in more course-oriented presentations from researchers, on topics such as numerical computing techniques.

Various domain-specific software courses, for example OpenFOAM, and code development, such as the recent GPU Hackathon, are hosted by Pawsey.

We will investigate whether there are interested researchers that are able to provide tutorial-style presentations.

It would be helpful if Pawsey provided an annual user meeting to showcase user research.

In the past we held an annual user symposium, with presentations from our users and the summer intern students, and is something we could look into reviving.

The use of different schedulers and naming conventions for directories and queues at Pawsey and NCI creates an additional overhead for groups using both centres.

Pawsey transitioned from PBS to SLURM to align with other large HPC centres with Cray systems, and the expertise and support we have been able to use as a result has resulted in a better and more reliable service for both Magnus and Galaxy.

We recognise this has created additional work for users coming from systems running PBS. While we had significant activities around PBS to SLURM migration when we made the switch, this has not been a focus in recent years.

With the upcoming capital refresh at both centres, we will endeavour to work with our colleagues at NCI to try to align more closely or provide tools to facilitate an easy transition.

What support can Pawsey provide to promote my research?

Pawsey hosts various events throughout the year, and we are always looking for researchers to present work made possible by using our facilities.

We also produce various promotional materials, such as use-case handouts.

If you have reached a point in your research where you are ready to promote your results, don't hesitate to get in touch via the user support portal.

It is also worth noting the importance of providing a high quality annual project report; these feed into the Pawsey annual report and help us obtain funding for more infrastructure to support your research.

We also follow up with groups that provide stand out annual reports to showcase their work with presentations and use-cases.

Manually providing publication details in applications is tedious, I would prefer to provide them via ORCID and select relevant publications from the list.

This is something that we would like to do, but requires significant development work. We will continue to investigate being able to support this in the future.

We'd like to finish on a positive note and touch on some of the things that have been going well:

  • Magnus has been running well outside of the CLE6 transition
  • The support from Pawsey staff has been very helpful
  • The documentation and training is something Pawsey does well

With hundreds of projects simultaneously running on a range of cutting edge HPC and data-intensive systems at scale, operating the Pawsey Supercomputing Centre is no easy task. We appreciate the understanding and willingness of Pawsey users to engage with challenges as they arise.

Back to Top

  • No labels