CUI - Restricted Data Projects
Overview
The Controlled Unclassified Information (CUI) cluster is a CPU + GPU system which is the result of a partnership between ARC and the Hume Center. CUI has 8 nodes, 512 CPU cores, 7 TB RAM, and 16 NVIDIA A100 GPUs. CUI hardware is summarized in the table below.
Node Type |
CPU |
GPU |
Total |
---|---|---|---|
Chip |
- |
||
Architecture |
Zen 2 |
Zen 2 |
- |
Slurm features |
- |
- |
- |
Nodes |
6 |
2 |
8 |
GPUs |
- |
8x NVIDIA A100-80G |
16 |
Cores/Node |
64 |
64 |
- |
Memory (GB)/Node |
512 |
2,048 |
- |
Maximum Memory for Slum (GB)/Node |
495 |
2,007 |
- |
Total Cores |
384 |
128 |
512 |
Total Memory (GB) |
3,072 |
4,096 |
7168 |
Local Disk |
240GB SSD |
240GB SSD |
- |
Interconnect |
HDR-100 IB |
HDR-100 IB |
- |
A VAST Flash storage system with 656TB of data storage capacity is connected to provide network-based storage for the cluster.
Access
The CUI system is set up to host projects which require some computational scale but are subject to controlled access restrictions such as the International Traffic in Arms Regulations (ITAR). Access to the CUI system requires a technology control plan (TCP) established with the Office of Export and Secure Research Compliance (OESRC) and consultation with ARC personnel to set up access and provide instructions for use.
Networks from which CUI is accessible
The login node for the CUI system, cui1.arc.vt.edu
, will only accept connections from secured hosts on VT networks. However, connections from the VT VPN are not allowed since they could originate from arbitrary locations.
If you do not have access to a secured host on a VT network, then you will likely need to connect from OESRC’s COMPASS system. COMPASS is also accessible from off-campus, US locations by first connecting to OESRC’s Barracuda VPN. Access using SSH keys is not permitted.
Running jobs
The CUI cluster uses SLURM for resource management and job scheduling. One key difference from other ARC clusters is that users’ jobs run using the the SLURM account “cui” (e.g. #SBATCH --account=cui
) instead of the personalized accounts you may use on elsewhere. The CUI cluster is not subject to any billing policies for utilization of resources.
Maintenance Downtimes
Patching and updates to resolve security or vulnerability issues are of high importance for this cluster, so it may be subject to downtimes for maintenance and security updates with little or no advance notice.
Partitions
Users submit jobs to partitions of the cluster depending on the type of resources needed (for example, CPUs or GPUs). If users do not specify the amount of memory requested for a job, the parameter DefMemPerCPU will automatically determine the amount of memory for the job based on the number of CPU cores requested.
Partition |
normal_q |
a100_normal_q |
---|---|---|
Node Type |
CPU |
GPU |
Features |
- |
- |
Number of Nodes |
6 |
2 |
DefMemPerCPU (MB) |
7680 |
31872 |
MaxMemPerCPU |
7680 |
31872 |
DefCpuPerGPU |
- |
- |
TRESBillingWeights |
- |
- |
PreemptMode |
OFF |
OFF |
DefaultTime |
1h |
1h |
MaxTime |
UNLIMITED |
UNLIMITED |
Optimization
The performance of jobs can be greatly enhanced by appropriate optimizations being applied. Not only does this reduce the execution time of jobs but it also makes more efficient use of the resources for the benefit of all.
See the tuning guides available at https://developer.amd.com and https://www.intel.com/content/www/us/en/developer/
General principles of optimization:
Cache locality really matters - process pinning can make a big difference on performance.
Hybrid programming often pays off - one MPI process per L3 cache with 4 threads is often optimal.
Use the appropriate
-march
flag to optimize the compiled code and-gencode
flag when using the NVCC compiler.
Suggested optimization parameters:
Node Type |
CPU |
GPU |
---|---|---|
CPU arch |
Zen 2 |
Zen 2 |
Compiler flags |
|
|
GPU arch |
- |
NVIDIA A100 |
Compute Capability |
- |
8.0 |
NVCC flags |
- |
|