Common Datasets

ARC clusters provide central storage for commonly used, large, open datasets. This helps reduce infrastructure costs by eliminating some unnecessary duplication and allows researchers to reserve their storage allocations for their own data.

How to Use Common Datasets

Common datasets are stored in the /common/data/ directory. They are accesible from all the clusters. For instance, /common/data/models/ contains many Large Language Models downloaded from Hugging Face.

Requests

Please submit an ARC Helpdesk request if you know of a dataset to be added to these locations. Please consider the following

  • Does the dataset’s licensing permit sharing in this manner?

  • Will several VT research groups be likely to benefit from the centralized hosting?

Submit a request via 4help:

  • Include “Request dataset to be added to /common on ARC systems” as the subject

  • Provide a link or reference to the dataset

  • Supply brief description of the data and its utility for your applications