Pytorch tensor

Posted under » Machine Learning on 11 April 2025

Machine learning is about making computers tell digital things or computer data apart. It could be text, image, sound etc. PyTorch offers domain-specific libraries such as TorchText, TorchVision and TorchAudio.

The process of data is using CPU. However, you can achieve the same thing yet faster if you use GPU or CUDA because GPU processes data in parallel. Here at anoneh, we are at the beginners stage and we try to keep things at a basic or simple level. We will use CPU at the moment.

The easiest way to install pytorch is via anaconda. You can also do it via pip. Let's go thru the quickstart tutorial. The init looks like

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

Now load the vision data

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

Fashion-MNIST is a dataset of images consisting of 60,000 training examples and 10,000 test examples. Each example comprises a 28×28 grayscale image and an associated label from one of 10 classes. We load the FashionMNIST Dataset with the following parameters:

root is the path where the train/test data is stored,
train specifies training or test dataset,
download=True downloads the data from the internet if it’s not available at root.
transform and target_transform specify the feature and label transformations

In numpy we have array and in pytorch, we call it tensor. After the data has been loaded, we pass the Dataset as an argument to DataLoader. This wraps an iterable over our dataset, and supports automatic batching, sampling, shuffling and multiprocess data loading.

Every TorchVision Dataset includes two arguments: transform and target_transform to modify the samples and labels respectively. Here we define a batch size of 64, i.e. each element in the dataloader iterable will return a batch of 64 features and labels.

batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break