Managing Virtual Environments on ARC - Overview

This page provides an overview of Python environment management on ARC and explains the differences between Conda (Miniforge/Miniconda) environments and pip/venv environments.

1. Overview

ARC supports two primary methods for creating isolated Python environments:

Conda environments using Miniforge or Miniconda
Python virtual environments using pip and venv

Each method has strengths depending on workload type, complexity, and GPU requirements.

2. Conda Environments (Miniforge / Miniconda)

2.1 Description

Conda provides a full environment and package manager designed for scientific computing and machine learning.

Note: ARC recommends Miniforge because it uses the community‑driven conda‑forge channel and avoids licensing restrictions of Anaconda.

2.2 When to Use Conda

Use Conda on ARC when:

You need GPU frameworks (PyTorch, TensorFlow, JAX)
You require CUDA or cudatoolkit management inside environments
Your project contains compiled dependencies
You need reproducibility and environment stability
Your workflow involves complex dependency resolution

2.3 Internal Documentation

See the full ARC page for complete instructions:

Managing Conda Environments Using Miniconda and Miniforge

3. pip + venv Environments

3.1 Description

venv is Python’s built‑in virtual environment tool, and pip installs packages inside it.

Note: pip/venv is lightweight and ideal for small or pure‑Python workflows.

3.2 When to Use `pip`/`venv`

Use pip/venv when:

You only need a small number of packages
Your workflow is CPU‑only
You want minimal overhead and fast environment creation
You prefer Python’s standard library tooling

3.3 Internal Documentation

See the full ARC page for complete instructions:

Python Virtual Environments Using pip and venv

4. Key Differences (Summary Table)

The table below summarizes the main differences between Conda (Miniforge/Miniconda) and pip/venv.
Each feature includes an explanation so users unfamiliar with dependency solvers, binary packages, or CUDA toolkits can make an informed choice.

Feature	Conda (Miniforge/Miniconda)	`pip` + `venv`
Dependency Solver	Advanced dependency solver that analyzes all required package versions to find a compatible set. This reduces version conflicts in scientific libraries with complex inter-dependencies (e.g., NumPy, SciPy, PyTorch).	No solver. `pip` installs packages individually without checking compatibility between them. Users must manually manage version conflicts, which can be challenging in ML/scientific stacks.
GPU / CUDA Support	Excellent GPU support. Conda packages frequently bundle their own CUDA toolkit (`cudatoolkit`), so you do not need to install system-level CUDA. This is ideal for PyTorch, TensorFlow, JAX, and GPU-intensive HPC workflows.	Limited GPU support. pip relies on prebuilt wheels that must match the system’s CUDA version. Mismatches can break GPU acceleration or cause installs to fail or fall back to CPU-only execution.
Binary Packages	Strong support for compiled libraries written in C/C++/Fortran. Conda-forge provides prebuilt binaries (e.g., SciPy, OpenCV, NumPy with MKL/OpenBLAS) that work reliably across systems without compiling.	Depends heavily on whether prebuilt wheels exist. Many scientific packages require compilation from source, which is slow and often fails on HPC systems without proper build tools.
Environment Size	Larger because Conda includes its own dependency metadata, solvers, and sometimes bundled runtimes (like CUDA). Environments may be hundreds of MBs or even several GBs.	Much smaller. `pip`/`venv` environments typically only store installed Python packages and minimal metadata. This makes them ideal for simple workflows or storage-limited environments.
Creation Speed	Slower due to dependency solving and downloading compiled packages. Creating an environment can take from several seconds to minutes depending on complexity.	Very fast. Creating a venv is nearly instantaneous, and installing pure-Python packages is typically quick.
Best For	Machine learning, GPU workflows, scientific computing, large research projects, and anything requiring compiled dependencies or CUDA.	Small tools, scripts, lightweight projects, teaching workflows, or CPU-only tasks that involve few dependencies.

5. ARC‑Specific Notes

5.1 Build on the Correct Node Type

warning Environments are not portable between node types or partitions.
Always build your Conda or pip/venv environment on the same node type where it will be used.

5.2 Jupyter Integration

You can use either environment type in Jupyter by installing an IPython kernel from within that environment. The internal documentation pages for Conda and pip/venv provide detailed instructions on how to install kernels and make your environment available as a selectable Jupyter kernel in Open OnDemand.

5.3 Slurm Usage

Conda and pip/venv environments should always be created through an interactive session on the target compute node type, since environments are not portable across node architectures. Once created, these environments can be safely activated and used in both Slurm interactive sessions and batch jobs, as long as you load the same modules and environment paths that were used during creation.

6. Summary

Use Conda for GPU computing, deep learning, CUDA control, and complex dependencies.
Use pip/venv for lightweight, rapid, and simple Python environments.
Both systems integrate with Jupyter and Slurm on ARC.
Always build environments on the correct node type.