Page tree
Skip to end of metadata
Go to start of metadata

On this page:

Introduction


All Pawsey systems are funded by the Australian Government (and hence the tax-payer) with the remit of supporting High End Computing to solve the most challenging scientific problems. To this end it is important that efficient use is made of these machines. Practical control of use is determined by allocations of time to projects based on peer review, and use of the queue system. The following policy and limits are set out to help to support this high end computing mandate at the level of the queue system.

The basic role of the queue system is to ensure that jobs with a range of requirements (both in size and in time) are run as quickly as possible for individual users, while ensuring efficient utilisation of the machine as a whole. The queue limits set out below also reflect the mandate for high end computing, i.e., we try to give priority to jobs that cannot reasonably be run on smaller systems.

Appropriate Use of Queues

The naming of queues on Pawsey resources has been done to minimise any misunderstanding of the purpose of the queue.  The debugq is only to be used for debugging.  The copyq is only to be used for copying / transferring data.  The gpuq is only to be used for GPU computing. The visq is only to be used for the visualisation of data.

In the case of the debugq, interactive jobs such as debugging with Arm Forge or compiling on compute nodes are appropriate, and it is acceptable in these cases to request the maximum walltime.  Testing jobscripts and working on job dependencies are also fine, but to do this individual jobs should not need to request more than five minutes of walltime.

As stated in the Conditions of Use, Pawsey reserves the right to suspend or disable access and may do so for inappropriate use of queues.

Allocation Underuse and Overuse

Project budgets are scheduled on a quarterly (3 month) basis. Typically, this will be 1/4 of the annual allocation per quarter. Quarterly allocations WILL BE LOST if they are not utilised by the end of the current quarter. The new quarterly allocation will come into effect at the start of the following quarter. Quarters start on the first day of January, April, July and October.

"Director's Share" projects are subject to a hard limit at 100% of their original allocation after which it will not be possible to run jobs against the project. Jobs can be queued but they will not run.

Projects that overrun their quarterly allocation in any one quarter can still run, albeit with reduced priority.  This is applicable to Magnus and Zeus, and each machine is treated separately.  For example, a project that is within its quarterly allocation on Zeus and over its allocation on Magnus will be at normal priority on Zeus and reduced priority on Magnus.

Additional constraints are imposed onto projects that have overused their quarterly allocation on Magnus and Zeus.  These are across all partitions of the system.  These per-user constraints promote a round-robin scheduling of jobs across users who are in the low-priority modes.

priority modeEligibility.  Usage relative to quarterly allocationmax number of jobs that can run at a time, per usermax number of pending jobs that are accruing priority as they age, per userEquivalent number of days penalty relative to normal jobs
highusage <100%---
normalusage <100%---
low100% < usage < 200%-836 days
lowest200% < usage2452 days

High Priority Mode

Similar to the express queue feature at other centres, some jobs can be run at high priority, subject to some limitations.  There is no "charging rate" for this feature, meaning it is not a multiplier on your usage.  This feature is intended for short test jobs before running a large simulation, or for running short test jobs during code development.  It complements and should be considered before Extraordinary Resource Requests.

On Magnus, while project usage is within the quarterly allocation, "high" priority mode is available, giving a significant priority boost above "normal" priority jobs.  High priority mode can be used for up to 5% of the quarterly allocation.  Once the quarterly allocation is used ("low" or "lowest" in the above section), or more than 5% is used in high priority mode, then access to high priority mode is removed until the quarterly reset.  You do not need to contact the helpdesk to use this feature.  The "quality of service" QOS feature of SLURM is used:

sbatch --qos=high myjob.sh

If qos=high is not available, you will receive an error such as

sbatch: error: Batch job submission failed: Invalid qos specification


Extraordinary Resource Requests

For extraordinary requests such as reservations, job walltime extensions and priority boosts, see the Extraordinary Resource Requests policy.

Magnus Queue Policy and Limits


Charging

The minimum unit of allocation and charging in the queue system is the 24-core node. Projects will be charged for all 24 cores in the node irrespective of how many cores are actually used by a given job. The total cost for a job in service units will be 24 x (number of nodes) x (wall clock time in hours). Accounting is based on actual wall clock time in seconds recorded by the queue system.

Job Queue Limits and Priorities

The following terms are used to describe jobs:

  • The policy distinguishes between three different job states: first, "Running", second "Pending", and third "Held" (which may appear in the queue system as a job in pending state with priority zero). Further, jobs either "Running" or "Pending" will be referred to collectively as "Active"; all jobs in the queue system, i.e., the sum of those that are "Active" and those that are "Held", are considered "Submitted".

The following applies to all users on a per-user basis.

  • There is a limit of 32 running jobs, and a limit of 36 active jobs, at any one time in the work partition; the total number of submitted jobs cannot exceed 512 in the queue at one time. Jobs in excess of the active limit will automatically be placed on hold until an existing running or pending job changes its status. Jobs in excess of the submitted limit will automatically be rejected.
  • There is a limit of 1 running job and 4 active jobs at any one time in the debug partition. Jobs in excess of the active limit will automatically be rejected.

The limits are summarised in the following table:

PartitionMin nodesMax NodesMax Wall timeMax Jobs RunningMax Jobs SubmittedMax Pending Jobs Accruing Age Priority
workq1136624 hours3251216
debugq161 hour14-

The underlying priority setting mechanism for jobs considers the following factors:

  • The number of nodes requested by a job.
  • The time requested by a job.
  • Recent usage by the user.
  • Recent overall usage by the user's project.

The final points represent a Fair Share mechanism which, as its name suggests, attempts to adjust the priority of jobs so that projects and users are moved towards their expected project usage based on allocation, or their fair share, over time.

The priority mechanism may be thought of as adjusting the apparent 'age' of a job. Older jobs, which have been in the queue for longer, are more likely to run next.

Galaxy Queue Policy and Limits


Mandate

The remit of the machine is radio astronomy and specifically for support of the ASKAP and MWA telescopes. Operational parameters are set to meet the needs of these radio astronomy users.

Charging

For CPU nodes the unit of allocation and charging in the queue system is the 20-core node. Projects will be charged for all 20 cores in a node irrespective of how many cores are actually used by a given job. The total cost for a job in service units will be 20 x (number of nodes) x (wall clock time in hours). Accounting is based on actual wall clock time in seconds recorded by the queue system.

The GPU partition is currently charged at an equivalent rate, taking into account its 8 cores per node.

Job Queue Limits

The limits are summarised in the following table:

PartitionMin NodesMax NodesMax Wall timeMax Jobs RunningMax Jobs SubmittedMax Pending Jobs Accruing Age Priority
workq147212 hours---
gpuq16424 hours---

The policy is subject to review and may be updated to meet operational needs.

Zeus Queue Policy and Limits


Charging

Zeus only have Broadwell CPUs and since mid-2019, all compute partitions of Zeus (excluding the copyq) contribute to usage figures.

TechnologyCharging rate
Intel Broadwell CPUseach CPU core for an hour = 1 service unit

Job Queue Limits

Zeus is intended for smaller pre-processing and post-processing jobs (including visualisation), and jobs otherwise not suitable for Magnus.

It has a number of specialised partitions in addition to its general-purpose workq. Queue limits have been set to reflect the priorities of each of partition.

Partition Maximum wall time Maximum nodes per job Maximum Jobs RunningMax Jobs SubmittedMax Pending Jobs Accruing Age Priority
workq 24 hours 8x 28-core nodes16512-
debugq1 hour4x 28-core nodes14-
copyq 48 hours 4x 16-core nodes 896-
longq96 hours1x 28-core node496-
highmemq96 hours4x 16-core nodes296-

highmemq partition should only be used for large memory workflows that require more than 128 GB memory. Any workflows that require less than 128GB should run in the workq.


There is more information on each of the Zeus partitions here: HPC Systems

Topaz Queue Policy and Limits


Charging

Topaz is a dedicated GPU-accelerated cluster. The GPU nodes on Topaz can be shared and exclusive, so the charge is based on the number of GPUs allocated to the job.

TechnologyCharging rate
Nvidia Volta GPUseach GPU for an hour = 36 service units
Nvidia Pascal GPUseach GPU for an hour = 36 service units

Job Queue Limits

Topaz has four partitions, two for production (gpuq & nvlinkq ) and two for testing and debugging (gpuq-dev & nvlinkq-dev).

Partition Maximum wall time Maximum nodes per job Maximum Jobs RunningMax Jobs SubmittedMax Pending Jobs Accruing Age Priority
gpuq24 hours 20x 16-core nodes81000-
gpuq-dev1 hour2x 16-core nodes14-
nvlinkq24 hours4x 16/20-core nodes896-
nvlinkq-dev1 hour4x 16/20-core nodes14-