Common Datasets
ARC clusters provide central storage for commonly used, large, open datasets. This helps reduce infrastructure costs by eliminating some unnecessary duplication and allows researchers to reserve their storage allocations for their own data.
How to Use Common Datasets
Common datasets are stored in the /common/data/
directory. They are accesible from all the clusters. For instance, /common/data/models/
contains many Large Language Models downloaded from Hugging Face.
Requests
Please submit an ARC Helpdesk request if you know of a dataset to be added to these locations. Please consider the following
Does the dataset’s licensing permit sharing in this manner?
Will several VT research groups be likely to benefit from the centralized hosting?
Submit a request via 4help:
Include “Request dataset to be added to /common on ARC systems” as the subject
Provide a link or reference to the dataset
Supply brief description of the data and its utility for your applications