Storage Overview
Below is a table of the general storage places available on ARC and descriptions of when to use each. Please also review our Data Best Practices page for more details about data clean up, data recovery, file permissions, and tips to compress your data.
Name |
Intent |
Per User Maximum |
Data Lifespan |
Notes |
|---|---|---|---|---|
Long-term storage of user data or compiled exxecutables |
640 GB |
As long as the user account is active |
Review data clean up tips. |
|
Long-term storage of shared group data/files |
50 TB per faculty researcher |
As long as the project account is active |
Additional storage avaliable for purchasing. Review data permissions. |
|
Short-term storage. Preferred place to store data during calculations (i.e. not in Home). |
No size limits enforced |
90 days |
More details in the scratch and local scratch sections. Automatic deletion. Each cluster has its own scratch. |
|
Long-term storage for infrequently-accessed files |
Length of the purchase agreement |
Managed by ARC staff |
Home
Home provides long-term storage for system-specific data or files, such as installed programs or compiled executables. Home can be reached the variable $HOME, so if a user wishes to navigate to their Home directory, they can simply type cd $HOME. Each user is provided a maximum of 640 GB in their Home directories (across all systems). Home directories are not allowed to exceed this limit.
Monitor your usage: a full $HOME will cause many complications
running jobs fail if they try to write to a Home directory when the hard limit is reached
You will not be able to log in to Open OnDemand (https://ood.arc.vt.edu)
many applications store configuration information in your home directory and will begin to fail in various ways if they cannot write to it
Please refer to our Data Clean Up page for more details.
Avoid reading/writing data to/from HOME in a job or using it as a working directory. Stage files into a “scratch” location to keep unnecessary I/O off of the HOME filesystem and improve performance, use Scratch and Local Scratch.
Project
Project provides long-term storage for files shared among a research project or group, facilitating collaboration and data exchange within the group. Each Virginia Tech faculty member can request group storage up to the prescribed limit at no cost by requesting a storage allocation via ColdFront. Additional storage may be purchased through the investment computing or cost center programs.
Due to the huge size of the file system (10PB) the storage is not backed up. If you would like your data to be backed up you may purchase back up storage via the cost center.
For more details about specific data permissions in the projects directory, please refer to our Data Permissions page/section for more details.
Scratch (temporary) storage
Scratch is the preferred location to use when running calculations on ARC especially if your code has to read/write a lot of files during the calculation. It is important to note that each cluster (e.g. Tinkercliffs, Owl, and Falcon) each have their own scratch file system. So for example, data stored in scratch on Tinkercliffs will not be accessible from the Owl cluster and so forth. See the table below for a breakdown for the types of Scratch and Local Scratch that ARC has:
Name |
Intent |
Per User Maximum |
Data Lifespan |
File System |
Environment Variable |
Available On |
|
|---|---|---|---|---|---|---|---|
Short-term storage. Preferred place to store data during calculations (i.e. not in Home). |
No size limits enforced |
90 days |
Vast |
- n/a - |
Login and compute nodes |
||
Fast, temporary storage. Auto-deleted when job ends |
Size of node hard drive |
Length of Job |
Local disk hard drives, usually spinning disk or SSD |
$TMPDIR |
Compute Nodes |
||
Very fast I/O |
Size of node memory allocated to job |
Length of Job |
Memory (RAM) |
$TMPFS |
Compute Nodes |
Scratch is a shared resource and has limited capacity, but individual use at any point in time is unlimited provided the space is available. A strict automatical deletion policy is in place wherein any file will be automatically deleted when it has reached an age of 90 days on /scratch.
Tips for using Scratch:
Create a directory for yourself
mkdir /scratch/<username>.Stage files for a job or set of jobs.
Check timestamps using
ls -l.Keep the number of files and directories relatively small (i.e., less than 10,000). It is a network-attached filesystem and incurs the same performance overhead for file operations that you would get with
/homeor/projects.Immediately copy any files you want to keep to a permanent location to avoid accidental deletion from the 90-day automatic deletion policy
rsyncgives new timestamps by default. Do not use the-t --timesand-a --archiveoptions which will reserve source timestamps.cpgives new timestamps by default. Avoid the-p --preserveoption which will preserve source timestamps.mvpreserves source timestamps by default and there are no options to override this. Usecpinstead. This is a general best practice for inter-filesystem transfers anyway.wgetpreserves source timestamps by default. Override this withwget --no-use-server-timestamps ...
Automatic Deletion Details
As mentioned above, files and directories in /scratch will be automatically deleted based on aging policies. Here is how that works:
The storage system runs an hourly job to identify files which have exceeding the aging policy (90 days) and adds these to the deletion queue.
The storage system runs an automated job at 12:00am UTC (7:00PM EST) every day to process the deletion queue.
Additionally, the storage system will detect and delete all empty directories regardless of age.
Local Scratch
Running jobs are given a workspace on the local drives on each compute node which are allocated to the job. The path to this space is specified in the $TMPDIR environment variable. This provides a higher performing option for I/O which is a bottleneck for some tasks that involve either handling a large volume of data or a large number of file operations.
Note
Any files in local scratch are removed at the end of a job, so any results or files to be kept after the job ends must be copied to another location as part of the job. Scratch is a good choice for most people.
Solid State Drives (SSDs)
Solid state drives do not use rotational media (spinning disks/platters) but memory-like flash storage which gives it better performance characteristics. The environment variable $TMPSSD is set to a directory on an SSD accessible to the owner of a job when SSD is available on compute nodes allocated to a job.
NVMe Drives
Same idea as Local Scratch, but on NVMe media which “has been designed to capitalize on the low latency and internal parallelism of solid-state storage devices.” Running jobs are given a workspace on the local NVMe drive on each compute node if it is so equipped. The path to this space is specified in the $TMPNVME environment variable. This provides another option for users who would prefer to do I/O to local disk (such as for some kinds of big data tasks). Please note that any files in local scratch are automatically removed at the end of a job, so any results or files to be kept after the job ends must be copied to Home or Project.
Memory as storage
Running jobs have access to an in-memory mount on compute nodes via the $TMPFS environment variable. This should provide very fast read/write speeds for jobs doing I/O to files that fit in memory. Please note that these files are removed at the end of a job, so any results or files to be kept after the job ends must be copied to Work or Home.
Archive
If you need an additional solution for data backup or long-term storage, we offer an archive option. Archive provides users with long-term storage for data that does not need to be frequently accessed i.e. storing important/historical results and for data preservation purposes to comply with mandates of federal grants (data retention policies). Archive is not mounted on the clusters. Archive is accessible only through ARC staff. Researchers can compress their datasets on the clusters and ARC staff will transfer to the archive.
Archive storage may be purchased through the investment computing or cost center programs. Please reach out to us to acquire archive storage.