Australia's two national peak supercomputing centres are funded via the Australian Government and other stakeholders. Pawsey and NCI have a duty to taxpayers to maximise the returns on these significant investments. Access focuses on the merit of the research to be undertaken as the primary determinant for access, assessed in the context of appropriateness for the infrastructure being utilised.
Allocations have a dollar value, and need to be made transparently. Thus applications are reviewed by committees of experts in their fields, with supporting advice from the relevant facility.
This document provides guidance on the merit allocation process, and how to create a competitive application. It does not supersede any rules or guidance provided in any Call for Applications.
Applications through competitive merit processes, by definition, are competitive. Most applications have merit, and most applicants are good researchers. Since supercomputers are a fixed size, and demand is often double what is available, prioritising allocations comes down to relative merit rather than absolute merit. An allocation that is lower than expected does not reflect on the absolute merit of the research or researchers.
Supercomputers generally have a lifespan of four years, and historically after four years are replaced with a larger or more powerful resource. At the start of a machine's life, it might be able to satisfy the requirements of all applicants, with capacity to spare. Over time new users come on board and the requirements of some projects grow, but the machine does not grow. Towards the end of the life of a machine, more projects mean the average allocation per project must be lower, even though each project may have larger requirements (which leads to frustration with long wait times in queues). It is easier to grow the size of allocations at the start of a machine's life, and difficult near the end.
In practice, the merit allocation meetings for all competitive merit schemes follow the same process.
- Administrative check
- Review and scoring
- Merit allocation meeting
Scoring and Ranking
Initially, applications are scored by the committee (or subset) according to the criteria for the scheme. Generally, applications for large allocations have more reviewers. Applications are matched to reviewers taking into account their areas of expertise and potential conflicts of interest. For merit allocation schemes where there is a technical component to the score, this part of the assessment and scoring is undertaken by Pawsey staff.
Applications are initially ranked by normalised (across reviewers) scores and placed into bands. Each application in a band receives a starting allocation which is a fixed proportion of the requested amount. Cutoffs between bands, as well as the initial allocation proportion for each band, vary depending on supply and demand, from year to year. This process is performed by the secretariat of the scheme, together with the Chair of the committee. As an example, for the 2018 Pawsey Partner scheme, these values are below, together with a plot of the scores and rankings:
It is important to remember that this is just a starting point, in an iterative process, e.g. some applications in band F may get their full request (due to a small request), some in A might get more than the initial cap (if high scoring and not all available time was allocated in the first iteration), and some in band E may get an allocation of zero and be recommended for a cloud allocation (e.g. for serial jobs or long running jobs). The plot of rankings shows how close most scores are, and thus how improving your score by say 10/100 can potentially result in a much larger allocation.
Following on from the scoring, ranking, and initial pass of allocations, the committees then go through all applications, one by one. They discuss each application and adjust the allocation based on ranking, explanations for discrepancies in past usage vs allocations, and any other points they may raise from the application, or from technical information provided by Pawsey.
Thus it is important to:
- Obtain a strong initial score
- Explain unused past allocations
- Explain any increases in the amount applied for
- Demonstrate why this particular supercomputer is required
- Justify the amount requested via technical arguments
The committee may make several passes through the list of applications, to optimise allocations within the available time under the scheme, if demand exceeds supply.
The committee makes a recommendation of allocations to the Directors of each facility (Pawsey and NCI). It is ultimately up to the facility Directors to approve the allocations.
Writing a Competitive Application
Poor scoring may not translate to poor quality of the research or researchers. Rather, poor scoring may be due to insufficient information to justify awarding taxpayer-funded infrastructure. Information provided in an application needs to support the application and be concise. Before submitting an application, it is important to revisit the assessment criteria to ensure they are addressed.
As an example of adding concise and useful information, do not mention in a project title or abstract that the application is a continuation of a previous allocation, or is to support the research group. Such statements take up space and add no value in those places. it is useful to state that the application is to continue a previous allocation, but do this in the body of the application (and also state the name of the previous project).
The following subsections are aligned to the assessment criteria for applications under the Pawsey Partner and Energy & Resources schemes. NCMAS currently does not place the same weighting on technical criteria, however it is still strongly recommended to have these aspects covered in the application, as they still factor in discussions by the allocation committee.
Project Quality and Innovation (research criteria)
Discuss how this project is unique, and in particular anything innovative about it. Examples include:
- applying an existing methodology to a new application (e.g. with data analytics)
- bringing a methodology that is common in a different field of research into your own field
- pushing computational boundaries such as system size or precision
Investigators (research criteria)
Ensure that applicants are eligible. In particular, for NCMAS that PIs and CIs are CIs on no more than one application, otherwise the applications may be rejected.
it is generally advantageous to join with other researchers into a single application, particularly if in the same research group or same department. The exception to this are Early Career Researchers looking to establish their own funding track record.
The research record of the applicants is an assessment criteria, contributing directly to the score. When asked for say the top ten publications, a larger group should be able to produce a stronger list than an individual applicant. Similarly for patents, book chapters, prizes, and funding attracted by the group.
Given that applications ranked higher generally receive a higher proportion of their requested amount, it should be obvious that a single stronger application will achieve a higher total allocation than two weaker applications for half the amount each.
It is important to only list funding that can be attributed to the investigators. For grants, the name of an investigator must be on the grants. Listing of funding that is not under the control or influence of investigators will result in an application being rejected from NCMAS. If a project is run within an ARC Centre of Excellence, but the investigators are not listed on its funding, then do not include it. If you want to include it, then the people running the Centre of Excellence should be applying on your behalf.
The Pawsey form (used for Pawsey Partner and Energy & Resources) asks for the significance of each publication. Do not write "this paper is significant". Write how it is significant, such as "this was the first simulation of the continent to use a resolution of 0.1m".
Benefit and Impact (research criteria)
Always be specific. Broad statements should be avoided. E.g. "This research will benefit the field of computational chemistry", or "this research is significant". Instead, "we are developing a new algorithm for optimising molecular modelling, which will impact drug design and drug delivery research", or "this new technology will improve efficiency of photo-voltaic cells by 5% without increasing production costs".
Where possible, relate the project outcomes to science priorities of the Australian Government. http://www.science.gov.au/sciencegov/scienceandresearchpriorities/pages/default.aspx
It is important to highlight the significance of the outcomes to your own field of research, as well as to society. For the benefits to your research field, this should be specific and the language should be understandable by another researcher in the field. For benefits to society, this should be understandable by researchers not within the field.
If your project is not a traditional academic project that will impact the research field, you still need to highlight and quantify the outcomes. E.g. for an industry-sponsored project, estimate the impact on the company or their customers. You should already know this, as you have convinced the industry partner to sponsor your research.
Suitability (technical criteria)
Is the resource being used efficiently? Demonstrate the steps you are taking to optimise outcomes from a finite resource.
Are you using a compiled language such as Fortran/C/C++, or something less efficient such as Python, Matlab, Java? If you have a Python interface to a C/MPI library, focus your description on where most execution time is spent. Rapid prototyping for algorithm development is a sensible use of tools that are less efficient at runtime. If you do this, say it.
Are you using Numpy/Scipy instead of pure Python?
Are you using numerical libraries such as Petsc, ScaLAPACK, etc? Or do you have a particular reason to use your own?
Have you undertaken optimisation effort, such as using Allinea MAP, participated in Pawsey uptake projects, or international collaborations to improve the software or workflow?
Justify the Resource
The committee will consider whether you really need this resource, or if alternatives are available. If the resource being applied for has a specific feature, then the committee will consider: the feature is required > feature can be used > feature is unused.
Some projects require very large jobs on a national peak supercomputer, for which they have no alternative. Projects which require a few nodes or less are very flexible in where they can run - a national peak supercomputer, a cluster, an institutional cluster, or even cloud. On a supercomputer, software with distributed parallelism (such as MPI) is favoured over software with intra-node parallelism such as OpenMP, which can run on cheaper infrastructure.
Some projects are data intensive, and although they do not require much compute resource, still require a national peak supercomputer due to filesystem performance or large storage space.
Some software requires specific technology, such as GPUs. These projects will be prioritised over software that would just use the CPUs which host the GPUs.
Does the resource support your workflow?
Magnus has a maximum wall time of 24 hours. What happens if your simulation requires more? If your software supports checkpoint/restart, then make it clear in the application that this will be used. With checkpoint/restart, rather than a simulation taking 96 hours, it is then three jobs of 24 hours each.
Scalability (technical criteria)
This overlaps with the suitability criteria, but particularly addresses the question of why you need to use a national peak supercomputer, and whether you can use it efficiently.
Performance and scalability data is historically lacking from many merit applications, and thus sets some applications above the rest.
Further to this, performance data based on a typical job you will run brings confidence to the requested allocation, as opposed to the requested amount appearing to be guessed or just based on last year's usage.
Performance and Scalability Data
Include scaling information for a typical job you will run. Do not include scaling information for someone else's job, such as a test case that is supplied with the software. Do not include scaling information from another system, as that will count against you as you have failed to demonstrate that you have logged in to the target system, have the software running on the target system, and can run it at scale.
Here is a real-world example using NWChem
Cost (Node hours)
Mention which one of these you think is good and why. In the above, 1024-cores is the most efficient use of core hours, as well as giving good job walltime. For consistency, this choice should be used in the calculation of how much total allocation you require.
Usage (technical criteria)
Explain unused past allocations, or large increases in usage
Explanations need to be made in the context of the number of users in a project.
For a single-user project, if the user was on parental or sick leave for six months, then this should be specified in terms of number of months off. It is not sufficient to say that the user was on sick leave for a while, resulting in 10% usage. It is sufficient to say that the user was on sick leave for six months, resulting in 50% usage.
For a large research group, care needs to be taken in using one person's absence to justify a large reduction in usage.
if the project was waiting for some input data to become available, then how many months was the project delayed?
Appropriateness of Request (technical criteria)
Justifying the Amount Requested
Usage of past allocations is important, as it demonstrates that the past allocations were not too large. However, the committee knows that unnecessary simulations can be run to pad the usage figures. It is important to provide data and an explanation that justifies the amount requested.
It is important to provide benchmarking figures for typical jobs, on the machine you are applying for. That is, job size and wall time. Then estimate how many jobs will be required, for which you have past usage to extrapolate from. If you have a variety of job sizes, then provide a few data points.
If you are not going to use all cores of a node, for example if you require more memory per core, then your total request needs to include the idle cores.