ARC User Documentation
Getting Started
Information for Faculty/PIs
Resources
Software
Usage
Frequently Asked Questions
Acceptable Use Policy
Inappropriate Use of Login Nodes
Data Best Practices: Clean Up, Recovery, Permissions, and Optimization/Compression
Data transfer tools
Job Scheduling and Performance Monitoring
Slurm Overview and Quick Reference
Slurm Scheduler Interaction
Parallelization with Slurm and GNU parallel
Monitoring and Logging GPU Utilization in your job
Performance Comparison of Scratch vs. Various ARC Filesystems
Software Modules
Setting up and using SSH Keys
Town Hall Meetings
Video Tutorials
Virtual Environments - Conda and Pip/Venv
Using VS Code on ARC Clusters
Workshops
Artificial Intelligence
ARC User Documentation
Usage
Job Scheduling and Performance Monitoring
Job Scheduling and Performance Monitoring
Contents:
Slurm Overview and Quick Reference
Cluster Terminology
Cluster Inspection and Status
Jobs: Requesting resources
Slurm constraints
Job status and control
Accounting
Slurm Scheduler Interaction
Submission Script
Job Management
Output
Parallelization with Slurm and GNU parallel
GNU Parallel in a single job
R example with big data
Max number of srun steps
Python example with GPUs
References
Monitoring and Logging GPU Utilization in your job
nvidia-smi –query-gpu=…
Make a bash function
Show only non-zero utilization and log to a csv file
Performance Comparison of Scratch vs. Various ARC Filesystems
Test Results Summary
Sample fileset properties:
Table of results
Lessons to infer from these results
Tinkercliffs A100 node with NVMe drive tests
Tinkercliffs login node testing against
/scratch