Frequently Asked Questions

Why can’t I log in?

Log in problems can occur for a number of reasons. If you cannot log into one of ARC’s systems, please check the following:

  1. Is your PID password expired? Try logging into onecampus.vt.edu. If you cannot log in there, then your PID password has likely expired and needs to be changed. (Contact 4Help for help with this issue.)

  2. Are you on-campus? If you are not on-campus, you will need to connect to the Virginia Tech VPN in order to access ARC’s systems.

  3. Is the hostname correct? Please check the name of the login node(s) for the system you are trying to access. For example, for login to TinkerCliffs, the hostname is not tinkercliffs.arc.vt.edu but rather tinkercliffs1.arc.vt.edu or tinkercliffs2.arc.vt.edu.

  4. Do you have an account? You must request an account on a system before you can log in.

  5. Is there a maintenance outage? ARC systems are occassionally taken offline for maintenance purposes. Users are typically notified via email well ahead of maintenance outages.

If you have checked all of the above and are still not sure why you cannot log in, please submit a help ticket.

How much does it cost to use ARC’s systems?

ARC’s systems are free to use, within limits. This means that Virginia Tech researchers can simply request an account to get access and run. Usage beyond fairly restrictive personal limits does require an approved allocation requested by a faculty member or project principal investigator; this requires some basic information to be provided, but getting an allocation does not require monetary payment of any kind. Researchers who need access to more resources beyond what we provide for free or who would like to purchase dedicated hardware can do so through our Cost Center or Investment programs. More information on how to get started with ARC is here.

Why is my job not starting?

The squeue command shows the running and pending jobs for a user, and provides the reason a pending job isn’t starting. Jobs run by other users will not be shown in the squeue output. To only show information for a particular job use squeue -j <jobid>. There are limits that apply per job, per user, and per account to ensure a fair utilization of reources among all users (see the Quality of Service (QoS) of each partition of the clusters). Consult all Slurm job reason codes which include some of the most common reasons:

Reason

Meaning

Recommendations

Priority or Resources

These two are the most common reasons for a job being pending (PD). Priority means that one of more higher priority jobs exist for the partition associated with the job. Resources mean that the job is waiting in the queue for resources (CPUs, GPUs, and/or memory) to become available. Jobs requesting more resources are likely to sit in the queue for longer.

No action needed. The job will start as soon as resources become available.

QOSMaxCpuPerUserLimit

The CPU request exceeds the per-user limit for the requested QoS.

Revise the total number of CPUs in jobs run by your user.

QOSMaxMemoryPerUser

The memory request exceeds the per-user limit for the requested QoS.

Revise the total amount of memory in jobs run by your user.

QOSMaxGRESPerUser

The GRES request for GPUs exceeds the per-user limit for the requested QoS.

Revise the total number of GPUs in jobs run by your user.

MaxCpuPerAccount

The CPU request exceeds the per-account limit on the job’s QoS.

Revise the total number of CPUs in jobs run in the account allocation.

MaxMemoryPerAccount

The memory request exceeds the per-account limit on the job’s QoS.

Revise the total amount of memory in jobs run in the account allocation.

MaxGRESPerAccount

The GRES request for GPUs exceeds the per-account limit on the job’s QoS.

Revise the total number of GPUs in jobs run in the account allocation.

QOSMaxWallDurationPerJobLimit

The wall time request of the job exceeds the wall time limit of the QoS.

Reduce the wall time within the QoS limits (1 day for short, 7 days for base (default), 14 days for long).

AssocGrpBillingMinutes

The account allocation of the job has exceeded its resource limits (e.g., in the free tier).

Acquire additional Service Units via the Cost Center, wait for the monthly renewal of the free tier, or run in preemtable partitions.

Why can’t I run on the login node?

One of the most common beginner mistakes on compute clusters is to log into the cluster and then immediately start running a computation. When you log into a cluster, you land on a login node. Login nodes are individual computers that represent a very small segment of the overall cluster and, crucially, are shared by many of the users who are logged into the cluster at a given time. So while basic tasks (editing files, checking jobs, perhaps making simple plots or compiling software) are fine to do on the login nodes, when you run a computationally-intensive task on the login node, you are adversely impacting other users (since the node is shared) while getting worse performance for yourself (by not using the bulk of the cluster). You should therefore submit your computationally intensive tasks to compute nodes by submitting a job to the scheduler. See here for documentation about job submission; we also have a video tutorial that will walk you through the process in a few minutes. Users who run problematic programs on the login node can have those tasks killed without warning. Users who repeatedly violate this policy arc subject to having their ARC account suspended (see Acceptable Use Policy).

When will my job start?

Adding the --start flag to squeue will provide the system’s best guess as to when the job will start, or give a reason for why the job will not start in the NODELIST(REASON) column. If no estimated start time is provided, please see Why is my job not starting? for more information.

How do I submit an interactive job?

A user can request an interactive session on a compute node (e.g., for debugging purposes), using interact, a wrapper script around srun. By default, this script will request one core (with one GPU on GPU clusters) for one hour on a default partition. An allocation must be provided:

interact -A yourallocation

The request can be customized with standard job submission flags used by srun or sbatch. Examples include:

  • Request two hours:

    interact -A yourallocation -t 2:00:00
    
  • Request two hours on the normal_q partition:

    interact -A yourallocation -t 2:00:00 -p normal_q
    
  • Request two hours on one core and one GPU on Falcon’s l40s_normal_q:

    interact -A yourallocation -t 2:00:00 -p l40s_normal_q -n 1 --gres=gpu:1
    

The flags for requesting resources may vary from system to system; please see the documentation for the system that you want to use.

Once the job has been submitted, the system may print out some information about the defaults that interact has chosen. Once the resources requested are available, you will then get a prompt on a compute node. You can issue commands on the compute node as you would on the login node or any other system. To exit the interactive session, simply type exit.

Note: As with any other job, if all resources on the requested queue are being used by running jobs at the time an interactive job is submitted, it may take some time for the interactive job to start.

Important: All CPU cores, memory, and GPUs allocated to your job will remain attached until you terminate it with exit. Idle interactive sessions consume resources that other users can’t use. Do not leave interactive sessions idle.

How can I interpret Slurm billing reports for compute usage?

Understanding ARC accounting is important to best utilize the free tier. Every PI is allocated 1M Service Units (SUs) per month in the free tier. Jobs run on the cluster consume service units based on the amount and type of resources allocated, and the length of the job. For example, one job requesting 4 CPU cores, 16 GB of memory, and 1 A100 GPU will consume 4 cores x 0.01 SUs/core + 16 GB * 0.0625 SUs/GB + 1 * 100 SUs/A100 GPU = 105 SU/s per hour. Users can use sacct -j <jobid> -X -o jobID,partition%20,AllocTREs%70 to understand the SUs consumed by a given job:

[user@tinkercliffs1 ~]$ sacct -j 294 -X -o jobID,partition%20,AllocTREs%70
JobID                   Partition                                                              AllocTRES
------------ -------------------- ----------------------------------------------------------------------
294                 a100_normal_q            billing=105,cpu=4,gres/gpu:a100=1,gres/gpu=1,mem=16G,node=1

Users can view the TRESBillingWeights for a given partition using scontrol show partition <partition_name> | grep TRESBillingWeights

[user@tinkercliffs1 ~]$ scontrol show partition a100_normal_q | grep TRESBillingWeights
   TRESBillingWeights=CPU=1.0,Mem=0.0625G,GRES/gpu=100.0

You can also estimate the SUs consumed by a job using this calculator based on the following table.

Resource

SU per hour

1 CPU core (default)

1

1 CPU core (Owl)

1.5

1 GB RAM

0.0625

1 H200 GPU

150

1 A100 GPU

100

1 L40s GPU

75

1 A30 GPU

75

1 V100 GPU

50

1 T4 GPU

25

How do I change a job’s stack size limit?

If your MPI code needs higher stack sizes then you may specify the stack size in the command that you specify to MPI. For example:

mpirun -bind-to-core -np $SLURM_NTASKS /bin/bash -c ulimit -s unlimited; ./your_program

How do I check my job’s resource usage?

The jobload command will report core and memory usage for each node of a given job. Example output is:

[user@tinkercliffs2 04/06 09:21:13 ~]$ jobload 129722
Basic job information:
     JOBID       PARTITION         NAME      ACCOUNT       USER    STATE         TIME   TIME_LIMIT  NODES NODELIST(REASON)
    129722        normal_q tinkercliffs  someaccount   someuser  RUNNING        43:43      8:00:00      2 tc[082-083]

Job is running on nodes: tc082 tc083

Node utilization is:
    node  cores   load    pct      mem     used    pct
   tc082    128  128.0  100.0  251.7GB  182.1GB   72.3
   tc083    128   47.9   37.4  251.7GB  187.2GB   74.3

This TinkerCliffs job is using all 128 cores on one node but only 48 cores on the second node. In this case, we know that the job has requested two full nodes, so some optimization may be in order to ensure that they’re using all of the requested resources. The job is, however, using 70-75% memory on both nodes, so the resource request may be intentional. If more information is required about a given node, the command scontrol show node tc083 can provide it.

How can I monitor GPU utilization during my job?

The nvidia-smi command with no other options diplays this information but prints to standard output (console or output file) and only once upon invocation. There are many options which can be added to tap into lots of extended functionality of this tool.

Add a line like this to the batch script prior to starting a GPU workload:

nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memy.used --format=csv -l 3 > $SLURM_JOBID.gpu.log &

The & causes the query to run in the background and keep running until the job ends or this process is manually killed. The > $SLURM_JOBID.gpu.log causes the output to be redirected to a file whose name is the numerical job id followed by .gpu.log.

The -l 3 is for the repeating polling interval. From the nvidia-smi manual:

-l SEC, --loop=SEC 
    Continuously report query data at the specified interval, rather than the default of just once.

For details on query options: nvidia-smi --help-query-gpu

Output from nvidia-smi run as above looks like this for a 2-GPU job (notice the different GPU identifier strings):

2021/10/29 16:36:30.047, A100-SXM-80GB, 00000000:CB:00.0, 460.73.01, 41, 0 %, 0 %, 81251 MiB, 81248 MiB, 3 MiB
2021/10/29 16:36:33.048, A100-SXM-80GB, 00000000:07:00.0, 460.73.01, 58, 16 %, 4 %, 81251 MiB, 66511 MiB, 14740 MiB
2021/10/29 16:36:33.053, A100-SXM-80GB, 00000000:CB:00.0, 460.73.01, 41, 0 %, 0 %, 81251 MiB, 81248 MiB, 3 MiB
2021/10/29 16:36:36.054, A100-SXM-80GB, 00000000:07:00.0, 460.73.01, 65, 98 %, 15 %, 81251 MiB, 66571 MiB, 14680 MiB
2021/10/29 16:36:36.055, A100-SXM-80GB, 00000000:CB:00.0, 460.73.01, 41, 0 %, 0 %, 81251 MiB, 81248 MiB, 3 MiB
2021/10/29 16:36:39.055, A100-SXM-80GB, 00000000:07:00.0, 460.73.01, 67, 100 %, 36 %, 81251 MiB, 66571 MiB, 14680 MiB
2021/10/29 16:36:39.056, A100-SXM-80GB, 00000000:CB:00.0, 460.73.01, 41, 0 %, 0 %, 81251 MiB, 81248 MiB, 3 MiB
2021/10/29 16:36:42.057, A100-SXM-80GB, 00000000:07:00.0, 460.73.01, 54, 10 %, 2 %, 81251 MiB, 66571 MiB, 14680 MiB
2021/10/29 16:36:42.058, A100-SXM-80GB, 00000000:CB:00.0, 460.73.01, 41, 0 %, 0 %, 81251 MiB, 81248 MiB, 3 MiB
2021/10/29 16:36:45.059, A100-SXM-80GB, 00000000:07:00.0, 460.73.01, 54, 0 %, 0 %, 81251 MiB, 66571 MiB, 14680 MiB
2021/10/29 16:36:45.060, A100-SXM-80GB, 00000000:CB:00.0, 460.73.01, 41, 0 %, 0 %, 81251 MiB, 81248 MiB, 3 MiB
2021/10/29 16:36:48.060, A100-SXM-80GB, 00000000:07:00.0, 460.73.01, 68, 100 %, 26 %, 81251 MiB, 66571 MiB, 14680 MiB
2021/10/29 16:36:48.061, A100-SXM-80GB, 00000000:CB:00.0, 460.73.01, 41, 0 %, 0 %, 81251 MiB, 81248 MiB, 3 MiB
2021/10/29 16:36:51.062, A100-SXM-80GB, 00000000:07:00.0, 460.73.01, 52, 20 %, 3 %, 81251 MiB, 66571 MiB, 14680 MiB
2021/10/29 16:36:51.063, A100-SXM-80GB, 00000000:CB:00.0, 460.73.01, 41, 0 %, 0 %, 81251 MiB, 81248 MiB, 3 MiB
2021/10/29 16:36:54.064, A100-SXM-80GB, 00000000:07:00.0, 460.73.01, 52, 0 %, 0 %, 81251 MiB, 66571 MiB, 14680 MiB

You can monitor the utilization information in near-real-time from a login node by navigating to the output directory for the job and using tail to follow the output with tail -f <jobid>.gpu.log. The CSV formatting makes it easy to analyze or generate graphics with other tools such as python, R, or matlab.

I need a software package for my research. Can you install it for me?

Yes! If you need a software package that can run on GNU/Linux you can submit a ticket and we will install it. Licensed software may require additional steps depending on the type of license (individual, group, or university-wide license). ARC staff install packages centrally via software modules to help all users easy access to the most common applications.

What is the best way to make sure everyone in my group has the same access to all the files in our shared directory?

The first step is to make sure the group id (GID) of all the files and directories are consistent and match the group id of the shared directory. The chgrp command does this but only the owner of a file can change its gid. So each member of the group needs to find files which they own and chgrp them to correct the GID. You will want to chmod them to ensure correct mode. Here is a template command sequence to do that:

# Show numeric group id of current user.
# This is the GID which will be used in the next step to identify files
id -g
# Find files in the shared directory matching current
# user's GID and execute a chgrp on them
find /projects/MYGROUPNAME -gid `id -g` -exec chgrp arc.MYGROUPNAME {} \;
# Find files in the shared directory matching current
# user's UID and execute a chmod on them to all group members to have read access
find /projects/MYGROUPNAME -uid `id -u` -exec chmod g+r arc.MYGROUPNAME {} \;

Any member of the group who has files in the shared directory with their GID will need to run those commands. Group ownership of files in the shared directories is inherited for newly created files and for files transferred with rsync with the correct options. Unfortunately, scp generally does not respect the parent GID so you will need to execute those commands afterward.

What does a “Disk quota exceeded” error mean?

This typically means that the quota of one of your storage locations has been exceeded. You will need to reduce the space consumed in order to run jobs successfully again. You can run the command quota to get a report of your storage utilization.

Important: the report generated by quota is not in real time. Therefore, any files you delete will not reflect new free space in quota immediately.

What does a “Detected 1 oom-kill event(s)” error mean?

If your job fails with an error like

slurmstepd: error: Detected 1 oom-kill event(s)

then your job triggered Linux’s Out of Memory (OOM) Killer process. This means that it tried to use more memory than allocated to the job. The seff command (Slurm job efficiency) also provides some information on resource utilization:

[user@tinkercliffs1 ~]$ seff 1447
Job ID: 1447
Cluster: tinkercliffs
User/Group: someuser/someuser
State: OUT_OF_MEMORY (exit code 0)
Nodes: 2
Cores per node: 32
CPU Utilized: 02:43:59
CPU Efficiency: 1.56% of 7-07:21:36 core-walltime
Job Wall-clock time: 02:44:24
Memory Utilized: 174.83 GB
Memory Efficiency: 49.11% of 356.00 GB

If your job is requesting an explict amount of memory with (e.g. --mem 16G) then increase the amount of memory allocated to the job. Recommendation: increment the amount of memory progressively until sufficient to run without error. Excessive memory allocations will waste memory and will make your job to sit in the queue for longer. If your job is not requesting an explicit amount of memory then Slurm is assigning a default amount of memory per CPU allocated. In this case, you can enforce the explicit amount of memory (recommended) or increase the number of CPU cores to force the system to assign to your job a larger memory amount (not recommended).

Why are basic commands like sbatch not recognized?

If basic commands like sbatch are not recognized, it is often because these default modules have been removed (e.g., via module purge). Please run module reset to restore the default $MODULEPATH.

How do I add a user to an allocation?

To add a user to an existing allocation, the PI must follow these steps:

  1. Go to ColdFront.

  2. You will see a list of your Projects. Click on the one you want to modify.

  3. Scroll down to Users and select Add Users.

  4. Under Search String enter the user’s PID (or a list of PIDs) and click Search.

  5. Scroll down, select the user whom you want to add, and click Add Selected Users to Project.

  6. The page will refresh and the user’s PID should be included in the Users table. They are now added to the project and its associated allocations.

Note: It may take up to 1 hour for the clusters to reflect changes made on ColdFront.

PIs are solely responsible of adding and removing users from their allocations. This is important for controlling access to data, especially if the data has access restrictions.

How do I attach to my process for debugging?

Debuggers like gdb make software development much more efficient. You may simply ssh to the compute node where the process is running, look up the process ID (e.g., with top or ps), and then attach to it.

How can I submit a job that depends on the completion of another job?

Sometimes it may be useful to split one large computation into multiple jobs (e.g. due to queue limits), but submit those jobs all at once. Jobs can be made dependent on each other using the --dependency=after:job_id flag to sbatch. Additional dependency options can be found in the documentation for sbatch. For example, here we submit three jobs, each of which depends on the preceding one:

[user@tinkercliffs2 ~]$ sbatch test.sh
Submitted batch job 126448
[user@tinkercliffs2 ~]$ sbatch --dependency=after:126448 test.sh
Submitted batch job 126449
[user@tinkercliffs2 ~]$ sbatch --dependency=after:126449 test.sh
Submitted batch job 126450

The first job starts right away, but the second doesn’t start until the first one finishes and the third job doesn’t start until the second one finishes. This allows the user to split their job up into multiple pieces, submit them all right away, and then just monitor them as they run one after the other to completion.

How can I run multiple serial tasks inside one job?

Users with serial (sequential) programs may want to package multiple serial tasks into a single job submitted to the scheduler. This can be done with third-party tools (GNU parallel is a good one) or using a loop within the job submission script. (A similar structure can be used to run multiple short, parallel tasks inside a job.) The basic structure is to loop through the number of tasks using while or for, start the task in the background using the & operator, and then use the wait command to wait for the tasks to finish:

    # Define variables
    numtasks=16
    np=1
    # Loop through numtasks tasks
    while [ $np -le $numtasks ]
    do
      # Run the task in the background with input and output depending on the variable np
      ./a.out $np > $np.out &

      # Increment task counter
      np=$((np+1))
    done

    # Wait for all of the tasks to finish
    wait

Please note that the above structure will only work within a single node. To ensure that the same program (with the same inputs) isn’t being run multiple times, users should make sure that the loop variable (np, above) is used to specify input files or parameters.

How can I run multiple short, parallel tasks inside one job?

Sometimes users have a parallel application that runs quickly, but that they need to run many times. In this case, it may be useful to package multiple parallel runs into a single job. This can be done using a loop within the job submission script. An example structure:

# Specify the list of tasks
    tasklist=task1 task2 task3

    # Loop through the tasks
    for tsk in $tasklist; do
      # run the task $tsk
      mpirun -np $SLURM_NTASKS ./a.out $tsk
    done

To ensure that the same program (with the same inputs) isn’t being run multiple times, users should make sure that the loop variable (tsk, above) is used to specify input files or parameters. Note that, unlike when running multiple serial tasks at once, in this case each task will not start until the previous one has finished.