All Pawsey systems are funded by the Australian Government (and hence the tax-payer) with the remit of supporting High End Computing to solve the most challenging scientific problems. To this end it is important that efficient use is made of these machines. Practical control of use is determined by allocations of time to projects based on peer review, and use of the queue system. The following policy and limits are set out to help to support this high end computing mandate at the level of the queue system.
The basic role of the queue system is to ensure that jobs with a range of requirements (both in size and in time) are run as quickly as possible for individual users, while ensuring efficient utilisation of the machine as a whole. The queue limits set out below also reflect the mandate for high end computing, i.e., we try to give priority to jobs that cannot reasonably be run on smaller systems.
Appropriate Use of Queues
The naming of queues on Pawsey resources has been done to minimise any misunderstanding of the purpose of the queue. The debugq is only to be used for debugging. The copyq is only to be used for copying / transferring data. The gpuq is only to be used for GPU computing. The visq is only to be used for the visualisation of data.
In the case of the debugq, interactive jobs such as debugging with Arm Forge or compiling on compute nodes are appropriate, and it is acceptable in these cases to request the maximum walltime. Testing jobscripts and working on job dependencies are also fine, but to do this individual jobs should not need to request more than five minutes of walltime.
As stated in the Conditions of Use, Pawsey reserves the right to suspend or disable access and may do so for inappropriate use of queues.
Allocation Underuse and Overuse
Project budgets are scheduled on a quarterly (3 month) basis. Typically, this will be 1/4 of the annual allocation per quarter. Quarterly allocations WILL BE LOST if they are not utilised by the end of the current quarter. The new quarterly allocation will come into effect at the start of the following quarter. Quarters start on the first day of January, April, July and October.
"Director's Share" projects are subject to a hard limit at 100% of their original allocation after which it will not be possible to run jobs against the project. Jobs can be queued but they will not run.
Projects that overrun their quarterly allocation in any one quarter can still run, albeit with reduced priority. This is applicable to Magnus and Zeus, and each machine is treated separately. For example, a project that is within its quarterly allocation on Zeus and over its allocation on Magnus will be at normal priority on Zeus and reduced priority on Magnus.
Additional constraints are imposed onto projects that have overused their quarterly allocation on Magnus and Zeus. These are across all partitions of the system. These per-user constraints promote a round-robin scheduling of jobs across users who are in the low-priority modes.
|priority mode||Eligibility. Usage relative to quarterly allocation||max number of jobs that can run at a time, per user||max number of pending jobs that are accruing priority as they age, per user||Equivalent number of days penalty relative to normal jobs|
|low||100% < usage < 200%||-||8||36 days|
|lowest||200% < usage||2||4||52 days|
High Priority Mode
Similar to the express queue feature at other centres, some jobs can be run at high priority, subject to some limitations. There is no "charging rate" for this feature, meaning it is not a multiplier on your usage. This feature is intended for short test jobs before running a large simulation, or for running short test jobs during code development. It complements and should be considered before Extraordinary Resource Requests.
On Magnus and Zeus, while project usage is within the quarterly allocation, "high" priority mode is available, giving a significant priority boost above "normal" priority jobs. High priority mode can be used for up to 5% of the quarterly allocation. Once the quarterly allocation is used ("low" or "lowest" in the above section), or more than 5% is used in high priority mode, then access to high priority mode is removed until the quarterly reset. You do not need to contact the helpdesk to use this feature. The "quality of service" QOS feature of SLURM is used:
If qos=high is not available, you will receive an error such as
Extraordinary Resource Requests
For extraordinary requests such as reservations, job walltime extensions and priority boosts, see the Extraordinary Resource Requests policy.
Magnus Queue Policy and Limits
The minimum unit of allocation and charging in the queue system is the 24-core node. Projects will be charged for all 24 cores in the node irrespective of how many cores are actually used by a given job. The total cost for a job in service units will be 24 x (number of nodes) x (wall clock time in hours). Accounting is based on actual wall clock time in seconds recorded by the queue system.
Job Queue Limits and Priorities
The following terms are used to describe jobs:
- The policy distinguishes between three different job states: first, "Running", second "Pending", and third "Held" (which may appear in the queue system as a job in pending state with priority zero). Further, jobs either "Running" or "Pending" will be referred to collectively as "Active"; all jobs in the queue system, i.e., the sum of those that are "Active" and those that are "Held", are considered "Submitted".
The following applies to all users on a per-user basis.
- There is a limit of 32 running jobs, and a limit of 36 active jobs, at any one time in the work partition; the total number of submitted jobs cannot exceed 512 in the queue at one time. Jobs in excess of the active limit will automatically be placed on hold until an existing running or pending job changes its status. Jobs in excess of the submitted limit will automatically be rejected.
- There is a limit of 1 running job and 4 active jobs at any one time in the debug partition. Jobs in excess of the active limit will automatically be rejected.
The limits are summarised in the following table:
|Partition||Min nodes||Max Nodes||Max Wall time||Max Jobs Running||Max Jobs Submitted||Max Pending Jobs Accruing Age Priority|
The underlying priority setting mechanism for jobs considers the following factors:
- The number of nodes requested by a job.
- The time requested by a job.
- Recent usage by the user.
- Recent overall usage by the user's project.
The final points represent a Fair Share mechanism which, as its name suggests, attempts to adjust the priority of jobs so that projects and users are moved towards their expected project usage based on allocation, or their fair share, over time.
The priority mechanism may be thought of as adjusting the apparent 'age' of a job. Older jobs, which have been in the queue for longer, are more likely to run next.
Galaxy Queue Policy and Limits
The remit of the machine is radio astronomy and specifically for support of the ASKAP and MWA telescopes. Operational parameters are set to meet the needs of these radio astronomy users.
For CPU nodes the unit of allocation and charging in the queue system is the 20-core node. Projects will be charged for all 20 cores in a node irrespective of how many cores are actually used by a given job. The total cost for a job in service units will be 20 x (number of nodes) x (wall clock time in hours). Accounting is based on actual wall clock time in seconds recorded by the queue system.
The GPU partition is currently charged at an equivalent rate, taking into account its 8 cores per node.
Job Queue Limits
The limits are summarised in the following table:
|Partition||Min Nodes||Max Nodes||Max Wall time||Max Jobs Running||Max Jobs Submitted||Max Pending Jobs Accruing Age Priority|
The policy is subject to review and may be updated to meet operational needs.
Zeus Queue Policy and Limits
Zeus is a mix of processor technologies, and as a result different partitions are charged at different rates. Since mid 2019, all compute partitions of Zeus (excluding the copyq) contribute to usage figures. The concept is the same as Service Units.
|Intel Broadwell CPUs||each CPU core for an hour = 1 core hour|
|NVidia Pascal GPUs||each GPU for an hour = 36 core hours|
|Intel Xeon Phi CPUs||each node for an hour = 25 core hours|
Job Queue Limits
Zeus is intended for smaller pre-processing and post-processing jobs (including visualisation), and jobs otherwise not suitable for Magnus.
It has a number of specialised partitions in addition to its general-purpose workq. Queue limits have been set to reflect the priorities of each of partition.
|Partition||Maximum wall time||Maximum nodes per job||Maximum Jobs Running||Max Jobs Submitted||Max Pending Jobs Accruing Age Priority|
|workq||24 hours||8x 28-core nodes||16||96||-|
|debugq||1 hour||4x 28-core nodes||1||4||-|
|gpuq||24 hours||4x 16/20-core node||4||96||-|
|copyq||48 hours||4x 16-core nodes||8||96||-|
|longq||96 hours||1x 28-core node||4||96||-|
|highmemq||96 hours||4x16-core node||2||96||-|
There is more information on each of the Zeus partitions here: HPC Systems