Slurm Scheduler Interaction

Jobs are submitted to ARC clusters through a job queuing system, or scheduler. Submission of jobs through a queueing system means that jobs may not run immediately, but will wait until the resources it requires (CPU cores, memory, and GPUs - if any) are available. The queuing system thus keeps the compute servers from being overloaded and allocates dedicated resources across running jobs. This will allow each job to run optimally once it leaves the queue. ARC uses the Slurm scheduler; descriptions of common interactions with Slurm are provided below. For a more detailed Slurm user guide, check out SchedMD’s online documentation and videos here: https://slurm.schedmd.com/tutorials.html. If you are familiar commands from another resource manager (e.g., Moab/PBS/Torque) and simply need to translate them to Slurm, see https://slurm.schedmd.com/rosetta.html.

Submission Script

Jobs are submitted with submission scripts that describe what resources the job requires and what the system should do once the job runs. Example submissions scripts are provided in the documentation for each system and can be used as a template for getting started. Note that jobs can also be started interactively, which can be very useful during testing and debugging. The resource requests are similar to PBS/Torque and include:

Partition (denoted by #SBATCH -p). Indicates the partition (or queue) to which the job should be submitted. Different partitions are intended for different hardware (e.g. CPU vs GPU) and therefore have different usage limits. The partition parameters are described in the documentation for each system.
Quality of Service (QoS) (denoted by #SBATCH --qos). QoS settings determine the job’s priority and resource limits within a given partition. Each partition has a default QoS named <partitionname>_base, which is applied if no QoS is explicitly requested. In addition, there are two alternative QoS options available per partition: <partitionname>_short for shorter jobs with higher scheduling priority, and <partitionname>_long for longer jobs that may wait longer in the queue. Choosing an appropriate QoS allows users to balance between queue time, resource, and runtime limits. Refer to the system documentation for specific limits associated with each QoS.
Walltime (denoted by #SBATCH -t). This is the time that you expect your job to run; so if you submit your job at 5:00pm on Wednesday and you expect it to finish at 5:00pm on Thursday, the walltime would be 24:00:00. Note that if your job exceeds the walltime estimated during submission, the scheduler will kill it. So it is important to be conservative (i.e., to err on the high side) with the walltime that you include in your submission script. Acceptable time formats include minutes, minutes:seconds, hours:minutes:seconds, days-hours,days-hours:minutes and days-hours:minutes:seconds.
Hardware (multiple options denoted by #SBATCH --gres=gpu:1, #SBATCH --mem=500G, #SBATCH --cpus-per-task=16, etc). This is the hardware that you want to reserve for your job. The types and quantity of available hardware, how to request them, and the limits for each user are described in the documentation.
Account (denoted by #SBATCH --account=[allocation]). Indicates the allocation account to which you want to charge the job.

The submission script should also specify what should happen when the job runs:

Software Modules. Use module commands to load the software that your job will need to run.
Run. Finally, you need to specify what commands you want to run. This can be execution of your own program or a call to a software package.

As an example, the following is a basic hello world example.

#!/bin/bash
#SBATCH -J hello-world   # Name of the job
#SBATCH --account=personal   # Account allocation
#SBATCH --partition=normal_q   # Partition of the cluster
#SBATCH --nodes=1   # Number of compute nodes
#SBATCH --ntasks-per-node=1   # Number of processes
#SBATCH --cpus-per-task=1   # Number of CPU cores per process
#SBATCH --time=0-00:10:00   # Runtime limit of 10 minutes
#SBATCH --gres=gpu:1   # Request one GPU (only valid on GPU partitions)

echo "Hello world from ..."
hostname

We recommend you to review ARC’s GitHub repository containing examples to run the most common software at https://github.com/AdvancedResearchComputing/examples

Job Management

To submit your job to the queuing system, use the command sbatch. For example, if your script is in JobScript.sh, the command would be:

sbatch ./JobScript.sh

This will return a message with your job id such as:

Submitted batch job 5123

Here 5123 is the job number. Once a job is submitted to a queue, it will wait until requested resources are available within that queue, and will then run if eligible. Eligibility to run is influenced by the resource policies in effect for the queue. To check a job’s status, use the squeue command:

squeue -j 5123

To check the status of more than one job or the queues in general, use squeue. Examples include:

squeue --state=Running   # View all running jobs by my user
squeue # View all jobs running and pending by my user

If your job has not started and you are unsure why, this FAQ provides some common explanations. To remove a job from the queue, or stop a running job, use the command scancel. For job number 5123, the command would be:

scancel 5123

Output

When your job has finished running, any outputs to stdout or stderr will be placed in a file in the directory where the job was submitted. For example, for a job submitted from JobScript.sh and with job ID 5123, the output would be in:

slurm-5123.out  # Output and errors will be here

This behavior can be modified using the --output and --error flags. Any files that the job writes to permanent storage locations will simply remain in those locations. Files written to locations only available during the life of the job (e.g. TMPFS or TMPDIR) will be removed once the job is completed, so those files must be moved to a permanent location at the end of the submission script.