ARC System Changes: 2026-01
ARC systems will be offline for maintenance from 08:00AM on Monday, January 5, 2026 through 5:00PM Thursday, January 8th, 2026. This outage affects all ARC general purpose clusters, storage, and services including:
Tinkercliffs, Owl, and Falcon clusters
access to these clusters’ login nodes and
/home,/projects, and/scratchdataARC’s Open OnDemand web interface for these clusters
ColdFront, ARC Dashboards, and ARC Globus services
ARC’s LLM services llm.arc.vt.edu and llm-api.arc.vt.edu
Maintenance for the CUI and Biomed clusters is performed at a different time and these resources will remain online during this maintenance.
Most of the scheduled tasks are for regular software and system maintenance which are essential for stablity and system security, but are expected to be transparent to end users. As we finalize the agenda and schedule for the outage, some additional topics may be added here if they will have a noticable impact on cluster usage.
Updates to GPU drivers
Currently all Nvidia GPUs on ARC systems are using the driver version 565.57.01 and we are targeting and update to the 580.105.18 drivers which will enable the latest CUDA toolkits, version 13.x and codes which rely on the latest CUDA software.
Warning
With CUDA 13+, Nvidia has dropped support for some older devices which applies specifically to the V100 nodes on Falcon. For this reason, please make sure your software environment on those nodes applies only CUDA toolkit versions 12.x and earlier.
Minor Reorganization of /scratch directories
Summary of changes
/scratch/useris created automatically whenuserruns a job on that clusterthe permissions on
/scratchare now restricted so that users cannot create or modify that top-level directory
Details and explanation
ARC’s /scratch directories are local to each cluster and provided high-speed, scalable, temporary storage and staging areas for data. The files in the /scratch filesystem which on eaech cluster which are older than 90 day are subject to automatic deletion. This prevents continual growth of resident data associated with “abandoned” files and directories which would otherwise eventually fill the entire storage system. As a side effect of the automatic deletion, a person will only have a /scratch/user directory on a cluster if they have “active” data in files there.
Before the maintenance, /scratch was writable by all users and would create their own /scratch/user directory in order to use this filesystem. This, however, allowed the potential for “collisions” in the case where multiple people attempt to write to the same file at the same time, risks of accidental deletion by others, and general clutter in the top level directory. This is now avoided by providing users their own subdirectories which ARC implemented for /scratch directories starting in January 2026.
Tips for managing /scratch/user
Check for existence of /scratch/user on a cluster
From the login node of the cluster, run the following command:
$ ls -l /scratch/$USER
When the directory exists, a long listing of the contents of the directory will be displayed. If the directory does not exist, there command will print the error ls: cannot access '/scratch/username': No such file or directory.
Force creation and verify existence:
Run a minimal job to force creation of your scratch directory:
$ srun --account=<Slurm account> ls -l /scratch/$USER
This will request and run a 1-cpu job and provide verification of existence with the listing.
As another option, any interactive app run from OnDemand would also result in the creation /scratch/$USER directory because those apps also run as jobs on the clusters.
New Software Installations
New software modules have been added including:
Module |
Info |
|---|---|
|
|
|
|
|
see note above about support for V100 GPUs |
|
|
|
latest “free open-source software” toolchain including BLAS, MPI, and other tools |
|
minimal version of |
|
recent OpenMPI which is a key component of the two toolchains above |
|
|
|
|
|
latest vLLM |
|
|
|
|
|
We recommend to always explicitly specify software versions when loading modules. For example:
Recommended |
Not Recommended |
|---|---|
|
|
When the version is not specified, a default version will be loaded which is usually the latest version. This can lead to conflicts when loading multiple modules in sequence or potential incompatbility with existing codes/binaries.
Increase Billing Rate for Usage of Memory
During this maintenance, the per-GB billing rate for the system memory allocated to a job was increased according to this table:
2025 and earlier |
2026 |
|
|---|---|---|
1GB RAM |
0.0625 SU/hr |
0.125 SU/hr |
1 SU |
16 GB |
8GB |
The increase is intended as a correction to more accurately reflect the proportional cost of a job’s memory allocation.
What should you do in response?
ARC always recommends that you monitor your jobs performance and resource utilization so that you can “right-size” the resource requests for future similar jobs. Tools for monitoring include:
job status |
command |
info |
|---|---|---|
running |
|
point-in-time state of processes in the job and utilization of resources |
completed |
|
Slurm’s summary report of overall cpu utilization and peak memory utilization |
running or completed |
|
print URLs which can show node-level utilization details over the duration of the job |
Default memory allocations for jobs exceed most jobs needs. You can request customized memory allocations in several ways:
specification |
description |
|---|---|
|
memory needed on each node allocated to the job |
|
memory needed for each CPU core allocated to the job |