PyTorch Tutorial

Training & Testing Neural Networks

Dataset & Dataloader

Dataset: stores data samples and expected values

Dataloader: groups data in batches, enables multiprocessing

dataset = MyDataset(file)
dataloader = DataLoader(dataset, batch_size, shuffle=True) 
# Shuffle: Traning True, Testing False

Design Own DataSet:

from torch.utils.data import Dataset, DataLoader
class MyDataset(Dataset):
  def init (self, file):  # Read data & preprocess
      self.data = ...
  def __getitem__(self, index): # Returns one sample at a time
      return self.data[index]
  def __len__(self): # Returns the size of the dataset
      return len(self.data)

Pasted image 20230522131707

Tensors

dim in PyTorch == axis in NumPy
Creating:
- x = torch.tensor([[1, -1], [-1, 1]])
- x = torch.from_numpy(np.array([[1, -1], [-1, 1]]))
- x = torch.zeros([2, 2]) #parameter is Shape object.
- x = torch.ones([1, 2, 5])

Common Operations:

Addition: z = x + y
Subtraction: z = x - y
Summation: y = x.sum()
Mean: y = x.mean()
Power: z = x.pow(y)

Transpose: Transpose two specified dimensions.

x = torch.zeros([2, 3])
x.shape # torch.Size([2, 3])
x = x.transpose(0, 1)
x.shape # torch.Size([3, 2])

Squeeze: remove the specified dimension with length = 1

x = torch.zeros([1, 2, 3])
x.shape # torch.Size([1, 2, 3])
x = x.squeeze(dim=0)
x.shape # torch.Size([2, 3])

unsqueeze: expand a new dimension

x = torch.zeros([2, 3])
x.shape # torch.Size([2, 3])
x = x.squeeze(dim=1)
x.shape # torch.Size([2, 3])

cat: concatenate multiple tensors

x = torch.zeros([2, 1, 3])
y = torch.zeros([2, 3, 3])
z = torch.zeros([2, 2, 3])
w = torch.cat([x, y, z], dim = 1)
w.shape # torch.Size([2, 6, 3])

Data Type: torch.Tensor — PyTorch 2.0 documentation
- 32-bit floating point → torch.float → torch.FloatTensor
- 64-bit integer point (signed) → torch.long → torch.LongTensor
Comparision with NumPy: torch.Tensor — PyTorch 2.0 documentation
- Similar attributes: shape, dtype
- Similar functions: reshape, squeeze, unsqueeze, view
  - x.unsqueeze(1) === np.expand_dims(x, 1)
Device
- Tensors & modules will be computed with CPU by default
- Use .to('cpu' | 'cuda') to move tensors to appropriate devices.
- Check if your computer has NVIDIA GPU: torch.cuda.is_available()
- Multiple GPUs: specify cuda:0, cuda:1, cuda:2, ...
- Why GPU: What is a GPU and do you need one in Deep Learning? | by Jason Dsouza | Towards Data Science
Gradient Calculation
- x = torch.tensor([[1., 0.], [-1., 1.]],requires_grad=True)
- z = x.pow(2).sum()
- z.backward()
- x.grad

Define Neural Network

import torch.nn as nn
class MyModel(nn.Module):
	def init(self): # Initialize your model & define layers
		super(MyModel, self).init ()
		self.net = nn.Sequential(
		nn.Linear(10, 32),
		nn.Sigmoid(),
		nn.Linear(32, 1)
	)
	def forward(self, x): # Compute output of your NN
		return self.net(x)

Loss Function

Mean Squared Error (for regression tasks)
- criterion = nn.MSELoss()
Cross Entropy (for classification tasks)
- criterion = nn.CrossEntropyLoss()
loss = criterion(model_output, expected_value)

Optimization

Gradient-based optimization algorithms that adjust network parameters to reduce error. (See Adaptive Learning Rate lecture video)
E.g. Stochastic Gradient Descent (SGD): optimizer = torch.optim.SGD(model.parameters(), lr, momentum = 0)
For every batch of data
- Call optimizer.zero_grad() to reset gradients of model parameters.
- Call loss.backward() to backpropagate gradients of prediction loss.
- Call optimizer.step() to adjust model parameters.

Workflow

Training Setup

dataset = MyDataset(file) # read data via MyDataset
tr_set = DataLoader(dataset, 16, shuffle=True) # put dataset into Dataloader
model = MyModel().to(device) # construct model and move to device (cpu/cuda)
criterion = nn.MSELoss() # set loss function
optimizer = torch.optim.SGD(model.parameters(), 0.1) # set optimizer

Training Loop

for epoch in range(n_epochs): # iterate n_epochs
	model.train() # set model to train mode
	for x, y in tr_set: # iterate through the dataloader
		optimizer.zero_grad() # set gradient to zero
		x, y = x.to(device), y.to(device) # move data to device (cpu/cuda)
		pred = model(x) # forward pass (compute output)
		loss = criterion(pred, y) # compute loss
		loss.backward() # compute gradient (backpropagation)
		optimizer.step() # update model with optimizer

Validation Loop

model.eval() # set model to evaluation mode
total_loss = 0
for x, y in dv_set: # iterate through the dataloader
	x, y = x.to(device), y.to(device) # move data to device (cpu/cuda)
	with torch.no_grad(): # disable gradient calculation
		pred = model(x) # forward pass (compute output)
		loss = criterion(pred, y) # compute loss
	total_loss += loss.cpu().item() * len(x) # accumulate loss
	avg_loss = total_loss / len(dv_set.dataset) # compute averaged loss

Testing Loop

model.eval() # set model to evaluation mode
preds = []
for x in tt_set: # iterate through the dataloader
	x = x.to(device) # move data to device (cpu/cuda)
	with torch.no_grad(): # disable gradient calculation
		pred = model(x) # forward pass (compute output)
		preds.append(pred.cpu()) # collect prediction

model.eval()

Changes behaviour of some model layers, such as dropout and batch normalization.
with torch.no_grad()

Prevents calculations from being added into gradient computation graph. Usually used to prevent accidental training on validation/testing data.

Save/Load Trained Models

Save torch.save(model.state_dict(), path)
Load

ckpt = torch.load(path)
model.load_state_dict(ckpt)

More About PyTorch

torchaudio

speech/audio processing

torchtext

natural language processing

torchvision

computer vision

skorch

scikit-learn + pyTorch

Useful github repositories using PyTorch

○ Huggingface Transformers (transformer models: BERT, GPT, …) ○ Fairseq (sequence modeling for NLP & speech) ○ ESPnet (speech recognition, translation, synthesis, …)

Reference

PyTorch Tutorial from NTU

Jeromy's Notes

Explorer