ARC System Changes: 2026-01

ARC systems will be offline for maintenance from 08:00AM on Monday, January 5, 2026 through 5:00PM Thursday, January 8th, 2026. This outage affects all ARC general purpose clusters, storage, and services including:

Tinkercliffs, Owl, and Falcon clusters
access to these clusters’ login nodes and /home, /projects, and /scratch data
ARC’s Open OnDemand web interface for these clusters
ColdFront, ARC Dashboards, and ARC Globus services
ARC’s LLM services llm.arc.vt.edu and llm-api.arc.vt.edu

Maintenance for the CUI and Biomed clusters is performed at a different time and these resources will remain online during this maintenance.

Most of the scheduled tasks are for regular software and system maintenance which are essential for stability and system security, but are expected to be transparent to end users. As we finalize the agenda and schedule for the outage, some additional topics may be added here if they will have a noticeable impact on cluster usage.

Updates to GPU drivers

Currently all Nvidia GPUs on ARC systems are using the driver version 565.57.01 and we are targeting and update to the 580.105.18 drivers which will enable the latest CUDA toolkits, version 13.x and codes which rely on the latest CUDA software.

Warning

With CUDA 13+, Nvidia has dropped support for some older devices which applies specifically to the V100 nodes on Falcon. For this reason, please make sure your software environment on those nodes applies only CUDA toolkit versions 12.x and earlier.

Minor Reorganization of /scratch directories

Summary of changes

/scratch/user is created automatically when user runs a job on that cluster
the permissions on /scratch are now restricted so that users cannot create or modify that top-level directory

Details and explanation

ARC’s /scratch directories are local to each cluster and provided high-speed, scalable, temporary storage and staging areas for data. The files in the /scratch filesystem which on each cluster which are older than 90 days are subject to automatic deletion. This prevents continual growth of resident data associated with “abandoned” files and directories which would otherwise eventually fill the entire storage system. As a side effect of the automatic deletion, a person will only have a /scratch/user directory on a cluster if they have “active” data in files there.

Before the maintenance, /scratch was writable by all users and would create their own /scratch/user directory in order to use this filesystem. This, however, allowed the potential for “collisions” in the case where multiple people attempt to write to the same file at the same time, risks of accidental deletion by others, and general clutter in the top level directory. This is now avoided by providing users their own subdirectories which ARC implemented for /scratch directories starting in January 2026.

Tips for managing `/scratch/user`

Check for existence of `/scratch/user` on a cluster

From the login node of the cluster, run the following command:

$ ls -l /scratch/$USER

When the directory exists, a long listing of the contents of the directory will be displayed. If the directory does not exist, the command will print the error ls: cannot access '/scratch/username': No such file or directory.

Force creation and verify existence:

Run a minimal job to force creation of your scratch directory:

$ srun --account=<Slurm account> ls -l /scratch/$USER

This will request and run a 1-cpu job and provide verification of existence with the listing.

As another option, any interactive app run from OnDemand would also result in the creation /scratch/$USER directory because those apps also run as jobs on the clusters.

New Software Installations

New software modules have been added including:

Module	Info
`Miniforge3/25.11.0-1`
`Miniconda3/25.11.1-1`
`CUDA/13.0.2`	see note above about support for V100 GPUs
`CUDA/12.8.0`
`foss/2025b`	latest “free open-source software” toolchain including BLAS, MPI, and other tools
`gompi/2025b`	minimal version of `foss/2025b`
`OpenMPI/5.0.8-GCC-14.3.0`	recent OpenMPI which is a key component of the two toolchains above
`MATLAB/R2025b`
`ParaView/6.0.1`
`vLLM/0.13.0`	latest vLLM
`LAMMPS/22Jul2025-foss-2024a-kokkos`
`ORCA/6.1.1-gompi-2023b-avx2`
`ollama/0.13.5-GCCcore-14.3.0-CUDA-13.0.2`

We recommend always explicitly specify software versions when loading modules. For example:

Recommended	Not Recommended
`module load MATLAB/R2025b`	`module load MATLAB`

When the version is not specified, a default version will be loaded which is usually the latest version. This can lead to conflicts when loading multiple modules in sequence or potential incompatibility with existing codes/binaries.

Increase Billing Rate for Usage of Memory

During this maintenance, the per-GB billing rate for the system memory allocated to a job was increased according to this table:

	2025 and earlier	2026
1GB RAM	0.0625 SU/hr	0.125 SU/hr
1 SU	16 GB	8GB

The increase is intended as a correction to more accurately reflect the proportional cost of a job’s memory allocation.

What should you do in response?

ARC always recommends that you monitor your jobs performance and resource utilization so that you can “right-size” the resource requests for future similar jobs. Tools for monitoring include:

job status	command	info
running	`showjobusage <jobid>`	point-in-time state of processes in the job and utilization of resources
completed	`seff <jobif>`	Slurm’s summary report of overall cpu utilization and peak memory utilization
running or completed	`getjobutilurl <jobid>`	print URLs which can show node-level utilization details over the duration of the job

Default memory allocations for jobs exceed most jobs needs. You can request customized memory allocations in several ways:

specification	description
`#SBATCH --mem=XG`	memory needed on each node allocated to the job
`#SBATCH --mem-per-cpu=XG`	memory needed for each CPU core allocated to the job