Managing Virtual Environments on ARC - Overview

This page provides an overview of Python environment management on ARC and explains the differences between Conda (Miniforge/Miniconda) environments and pip/venv environments.


1. Overview

ARC supports two primary methods for creating isolated Python environments:

  • Conda environments using Miniforge or Miniconda

  • Python virtual environments using pip and venv

Each method has strengths depending on workload type, complexity, and GPU requirements.


2. Conda Environments (Miniforge / Miniconda)

2.1 Description

Conda provides a full environment and package manager designed for scientific computing and machine learning.

Note: ARC recommends Miniforge because it uses the community‑driven conda‑forge channel and avoids licensing restrictions of Anaconda.

2.2 When to Use Conda

Use Conda on ARC when:

  • You need GPU frameworks (PyTorch, TensorFlow, JAX)

  • You require CUDA or cudatoolkit management inside environments

  • Your project contains compiled dependencies

  • You need reproducibility and environment stability

  • Your workflow involves complex dependency resolution

2.3 Internal Documentation

See the full ARC page for complete instructions:

Managing Conda Environments Using Miniconda and Miniforge


3. pip + venv Environments

3.1 Description

venv is Python’s built‑in virtual environment tool, and pip installs packages inside it.

Note: pip/venv is lightweight and ideal for small or pure‑Python workflows.

3.2 When to Use pip/venv

Use pip/venv when:

  • You only need a small number of packages

  • Your workflow is CPU‑only

  • You want minimal overhead and fast environment creation

  • You prefer Python’s standard library tooling

3.3 Internal Documentation

See the full ARC page for complete instructions:

Python Virtual Environments Using pip and venv


4. Key Differences (Summary Table)

The table below summarizes the main differences between Conda (Miniforge/Miniconda) and pip/venv.
Each feature includes an explanation so users unfamiliar with dependency solvers, binary packages, or CUDA toolkits can make an informed choice.

Feature

Conda (Miniforge/Miniconda)

pip + venv

Dependency Solver

Advanced dependency solver that analyzes all required package versions to find a compatible set. This reduces version conflicts in scientific libraries with complex inter-dependencies (e.g., NumPy, SciPy, PyTorch).

No solver. pip installs packages individually without checking compatibility between them. Users must manually manage version conflicts, which can be challenging in ML/scientific stacks.

GPU / CUDA Support

Excellent GPU support. Conda packages frequently bundle their own CUDA toolkit (cudatoolkit), so you do not need to install system-level CUDA. This is ideal for PyTorch, TensorFlow, JAX, and GPU-intensive HPC workflows.

Limited GPU support. pip relies on prebuilt wheels that must match the system’s CUDA version. Mismatches can break GPU acceleration or cause installs to fail or fall back to CPU-only execution.

Binary Packages

Strong support for compiled libraries written in C/C++/Fortran. Conda-forge provides prebuilt binaries (e.g., SciPy, OpenCV, NumPy with MKL/OpenBLAS) that work reliably across systems without compiling.

Depends heavily on whether prebuilt wheels exist. Many scientific packages require compilation from source, which is slow and often fails on HPC systems without proper build tools.

Environment Size

Larger because Conda includes its own dependency metadata, solvers, and sometimes bundled runtimes (like CUDA). Environments may be hundreds of MBs or even several GBs.

Much smaller. pip/venv environments typically only store installed Python packages and minimal metadata. This makes them ideal for simple workflows or storage-limited environments.

Creation Speed

Slower due to dependency solving and downloading compiled packages. Creating an environment can take from several seconds to minutes depending on complexity.

Very fast. Creating a venv is nearly instantaneous, and installing pure-Python packages is typically quick.

Best For

Machine learning, GPU workflows, scientific computing, large research projects, and anything requiring compiled dependencies or CUDA.

Small tools, scripts, lightweight projects, teaching workflows, or CPU-only tasks that involve few dependencies.


5. ARC‑Specific Notes

5.1 Build on the Correct Node Type

warning Environments are not portable between node types or partitions.
Always build your Conda or pip/venv environment on the same node type where it will be used.

5.2 Jupyter Integration

You can use either environment type in Jupyter by installing an IPython kernel from within that environment. The internal documentation pages for Conda and pip/venv provide detailed instructions on how to install kernels and make your environment available as a selectable Jupyter kernel in Open OnDemand.

5.3 Slurm Usage

Conda and pip/venv environments should always be created through an interactive session on the target compute node type, since environments are not portable across node architectures. Once created, these environments can be safely activated and used in both Slurm interactive sessions and batch jobs, as long as you load the same modules and environment paths that were used during creation.


6. Summary

  • Use Conda for GPU computing, deep learning, CUDA control, and complex dependencies.

  • Use pip/venv for lightweight, rapid, and simple Python environments.

  • Both systems integrate with Jupyter and Slurm on ARC.

  • Always build environments on the correct node type.


This page is intended as a high‑level decision guide. Full instructions are available in the linked pages: