Training & Testing Neural Networks

Dataset & Dataloader

  • Dataset: stores data samples and expected values

  • Dataloader: groups data in batches, enables multiprocessing

    dataset = MyDataset(file)
    dataloader = DataLoader(dataset, batch_size, shuffle=True) 
    # Shuffle: Traning True, Testing False
  • Design Own DataSet:

    from torch.utils.data import Dataset, DataLoader
    class MyDataset(Dataset):
      def init (self, file):  # Read data & preprocess
          self.data = ...
      def __getitem__(self, index): # Returns one sample at a time
          return self.data[index]
      def __len__(self): # Returns the size of the dataset
          return len(self.data)

    Pasted image 20230522131707

Tensors

  • dim in PyTorch == axis in NumPy
  • Creating:
    • x = torch.tensor([[1, -1], [-1, 1]])
    • x = torch.from_numpy(np.array([[1, -1], [-1, 1]]))
    • x = torch.zeros([2, 2]) #parameter is Shape object.
    • x = torch.ones([1, 2, 5])
  • Common Operations:
    • Addition: z = x + y

    • Subtraction: z = x - y

    • Summation: y = x.sum()

    • Mean: y = x.mean()

    • Power: z = x.pow(y)

    • Transpose: Transpose two specified dimensions.

      x = torch.zeros([2, 3])
      x.shape # torch.Size([2, 3])
      x = x.transpose(0, 1)
      x.shape # torch.Size([3, 2])
    • Squeeze: remove the specified dimension with length = 1

      x = torch.zeros([1, 2, 3])
      x.shape # torch.Size([1, 2, 3])
      x = x.squeeze(dim=0)
      x.shape # torch.Size([2, 3])
    • unsqueeze: expand a new dimension

      x = torch.zeros([2, 3])
      x.shape # torch.Size([2, 3])
      x = x.squeeze(dim=1)
      x.shape # torch.Size([2, 3])
    • cat: concatenate multiple tensors

      x = torch.zeros([2, 1, 3])
      y = torch.zeros([2, 3, 3])
      z = torch.zeros([2, 2, 3])
      w = torch.cat([x, y, z], dim = 1)
      w.shape # torch.Size([2, 6, 3])
  • Data Type: torch.Tensor — PyTorch 2.0 documentation
    • 32-bit floating point torch.float torch.FloatTensor
    • 64-bit integer point (signed) torch.long torch.LongTensor
  • Comparision with NumPy: torch.Tensor — PyTorch 2.0 documentation
    • Similar attributes: shape, dtype
    • Similar functions: reshape, squeeze, unsqueeze, view
      • x.unsqueeze(1) === np.expand_dims(x, 1)
  • Device
  • Gradient Calculation
    • x = torch.tensor([[1., 0.], [-1., 1.]],requires_grad=True)
    • z = x.pow(2).sum()
    • z.backward()
    • x.grad

Define Neural Network

import torch.nn as nn
class MyModel(nn.Module):
	def init(self): # Initialize your model & define layers
		super(MyModel, self).init ()
		self.net = nn.Sequential(
		nn.Linear(10, 32),
		nn.Sigmoid(),
		nn.Linear(32, 1)
	)
	def forward(self, x): # Compute output of your NN
		return self.net(x)

Loss Function

  • Mean Squared Error (for regression tasks)
    • criterion = nn.MSELoss()
  • Cross Entropy (for classification tasks)
    • criterion = nn.CrossEntropyLoss()
  • loss = criterion(model_output, expected_value)

Optimization

  • Gradient-based optimization algorithms that adjust network parameters to reduce error. (See Adaptive Learning Rate lecture video)
  • E.g. Stochastic Gradient Descent (SGD): optimizer = torch.optim.SGD(model.parameters(), lr, momentum = 0)
  • For every batch of data
    • Call optimizer.zero_grad() to reset gradients of model parameters.
    • Call loss.backward() to backpropagate gradients of prediction loss.
    • Call optimizer.step() to adjust model parameters.

Workflow

Training Setup

dataset = MyDataset(file) # read data via MyDataset
tr_set = DataLoader(dataset, 16, shuffle=True) # put dataset into Dataloader
model = MyModel().to(device) # construct model and move to device (cpu/cuda)
criterion = nn.MSELoss() # set loss function
optimizer = torch.optim.SGD(model.parameters(), 0.1) # set optimizer

Training Loop

for epoch in range(n_epochs): # iterate n_epochs
	model.train() # set model to train mode
	for x, y in tr_set: # iterate through the dataloader
		optimizer.zero_grad() # set gradient to zero
		x, y = x.to(device), y.to(device) # move data to device (cpu/cuda)
		pred = model(x) # forward pass (compute output)
		loss = criterion(pred, y) # compute loss
		loss.backward() # compute gradient (backpropagation)
		optimizer.step() # update model with optimizer

Validation Loop

model.eval() # set model to evaluation mode
total_loss = 0
for x, y in dv_set: # iterate through the dataloader
	x, y = x.to(device), y.to(device) # move data to device (cpu/cuda)
	with torch.no_grad(): # disable gradient calculation
		pred = model(x) # forward pass (compute output)
		loss = criterion(pred, y) # compute loss
	total_loss += loss.cpu().item() * len(x) # accumulate loss
	avg_loss = total_loss / len(dv_set.dataset) # compute averaged loss

Testing Loop

model.eval() # set model to evaluation mode
preds = []
for x in tt_set: # iterate through the dataloader
	x = x.to(device) # move data to device (cpu/cuda)
	with torch.no_grad(): # disable gradient calculation
		pred = model(x) # forward pass (compute output)
		preds.append(pred.cpu()) # collect prediction
  • model.eval()

    Changes behaviour of some model layers, such as dropout and batch normalization.

  • with torch.no_grad()

    Prevents calculations from being added into gradient computation graph. Usually used to prevent accidental training on validation/testing data.

Save/Load Trained Models

  • Save torch.save(model.state_dict(), path)
  • Load
ckpt = torch.load(path)
model.load_state_dict(ckpt)

More About PyTorch

torchaudio

speech/audio processing

torchtext

natural language processing

torchvision

computer vision

skorch

scikit-learn + pyTorch

Useful github repositories using PyTorch

Huggingface Transformers (transformer models: BERT, GPT, …) ○ Fairseq (sequence modeling for NLP & speech) ○ ESPnet (speech recognition, translation, synthesis, …)

Reference

  1. PyTorch Tutorial from NTU