For use when you need to run the same program over a number of files. In non-supercomputing environments, you might use a loop or gnu parallel. However, we can make slurm perform parallelisation for us with minimal effort. This has been tested on Zeus. The maximum number of jobs that can be created with a single array is 1000. However, the workq partition on Zeus is limited to a max of 512 jobs in the queue, and 16 jobs running concurrently. Therefore, do not create an array that will spawn more than 512 jobs. With the limit on number of concurrent jobs, a max of 16 will run at a time, with a new one spawning as an older one completes.
Guided example
We will make use of two scripts- a job script that performs the task we need, and a 'helper' script that sets up the array and launches the job script. The example program we will use is ExpansionHunter DeNovo, a bioinformatic tool for detecting repeat expansions from sequencing data.
Part 1- Making your job script
A template is provided below. This will print out helpful information into your log files to help you keep track of your jobs.
Part 2- Making your helper script.
When you are satisifed with the scripts, launch them with "sbatch sbatch_helper.sh". If you watch your jobqueue with "watch squeue -u USERNAME" you should see your helper script running, and then see it spawn a number of jobs.
What if I have more than 1000 jobs
If you are running something with a lot of jobs, another option is job packing, described here: job packing with mpibash and libcircle
Remember we are available to assist, so if you have a very complex job, please contact the helpdesk for advice.
Further info on job arrays can be found here: Example Workflows