PyTorch

Introduction

Pytorch is an open source machine learning framework that accelerates the path from research prototyping to production deployment.

Availability

ARC provides a module on the CPU partitions PyTorch/2.1.2-foss-2023a and GPU partitions PyTorch/2.1.2-foss-2023a-CUDA-12.1.1. However, since PyTorch is continually releasing new versions, it’s recommended to be installed via pip. To install PyTorch, you should first get an interactive job on a GPU node, load Miniconda and then create the environment. We recommend installing ipykernel to let you connect your environment with Jupyter running on Open OnDemand.

## Example on TinkerCliffs for A100 nodes:
interact --account=<your allocation> --partition=a100_normal_q -N 1 -n 12 --gres=gpu:1
module load Miniconda3
conda create -n pytorch
source activate pytorch
pip3 install torch torchvision ipykernel --index-url https://download.pytorch.org/whl/cu126

Important: Verify the PyTorch version installed is the GPU version.

$ python
import torch
print(torch.__version__)
print(torch.version.cuda)          # None if CPU-only
print(torch.backends.cudnn.enabled)
print(torch.cuda.is_available())   # True if a compatible GPU is available and CUDA is enabled

Warning

NOTE: GPU support for AI/ML codes can offer SIGNIFICANT computational speed improvments. Simply installing the defaults as per the docs may or may not result in code utilizing the GPUs. Test your code with a small example prior to running your full dataset. You can ssh to the node your job is running on and use nvidia-smi to see that your code is running on the GPU.

Interaction

You can run PyTorch code from Jupyter Notebooks or via the command line (interactive or scripts). Ideally, you will prototype your code via Jupyter which is easily accessible from Open OnDemand. If instead, you would prefer to prototype your code via the command line, first get an interactive job as above in the install notes, then load the required software, e.g. Miniconda.

Quick example from the pytorch.org site

The PyTorch tutorials are excellent. For brevity, we can run through the CIFAR10 example from the PyTorch docs:
https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py

Here is the example python script, you can run it manually or via python cifar10.py

## cifar10.py
## import libraries
import torch
import torchvision
import torchvision.transforms as transforms

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Assuming that we are on a CUDA machine, this should print a CUDA device:

print(device)

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

batch_size = 4

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

#import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    #plt.imshow(np.transpose(npimg, (1, 2, 0)))
    #plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join(f'{classes[labels[j]]:5s}' for j in range(batch_size)))

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()
net.to(device)


import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data[0].to(device), data[1].to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

dataiter = iter(testloader)
images, labels = next(dataiter)

# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join(f'{classes[labels[j]]:5s}' for j in range(4)))