Section Navigation

▼ Api
- Index
- Activations
- Dataloaders
- Dropouts
- Losses
- Normalizations
- Optimizations
- Regularizations
- Trainers
- Utils
- ▼ Datasets
- ▼ Models
- ▼ Transforms
▼ Comparisons
- Comparison
▼ Examples
- Index
- ▼ Audio
  - ▼ Audio classification
    Environmental sounds
    Music genre
  - ▼ Speech recognition
    E2e ctc
    Keyword spotting
- ▼ Computer vision
  - ▼ Image classification
    Finetuning resnet cifar10
    Lenet mnist
    Transfer learning custom
  - ▼ Image generation
    Cyclegan
    Dcgan
  - ▼ Object detection
    Faster rcnn
    Yolov3 coco
  - ▼ Semantic segmentation
    Deeplab v3
    Mask rcnn
- ▼ Data handling
  - ▼ Dataloaders
    Efficient loading
  - ▼ Datasets
    Builtin datasets
    Custom datasets
  - ▼ Transforms
    Image augmentation
- ▼ Deployment
  - ▼ Inference
    Cpp app
    Tensorrt
  - ▼ Serialization
    Export torchscript
    Save load
  - ▼ Web services
    Rest api
- ▼ Distributed
  - ▼ Data parallelism
    Multi gpu
  - ▼ Model parallelism
    Model splitting
  - ▼ Multi machine
    Setup
- ▼ Generative
  - ▼ Autoencoders
    Denoising ae
    Vae
  - ▼ Diffusion
    Ddpm
  - ▼ Gans
    Mnist gan
    Progressive gan
- ▼ Getting started
- ▼ Gnn
  - ▼ Graph level
    Diffpool
    Mpnn
  - ▼ Node level
    Gcn
    Graphsage
- ▼ Nlp
  - ▼ Language modeling
    Finetuning bert
    Training gpt
  - ▼ Seq2seq
    Machine translation
    Summarization
  - ▼ Text classification
    Sentiment rnn
    Transformer classification
- ▼ Optimization
  - ▼ Lr schedulers
    Cosine annealing
    Step decay
  - ▼ Optimizers
    Adamw
    Sgd momentum
  - ▼ Regularization
    Dropout
    Weight decay
- ▼ Performance
  - ▼ Memory
    Data loading
    Gradient checkpointing
  - ▼ Speed
    Mixed precision
    Profiling
- ▼ Rl
  - ▼ Policy based
    Ppo
    Reinforce
  - ▼ Value based
    Dqn atari
    Q learning
- ▼ Time series
  - ▼ Anomaly detection
    Autoencoders
  - ▼ Forecasting
    Lstm
    Multivariate
▼ Getting started
- Installation
- Quick start cnn
▼ User guide

Transforms

Transforms are a fundamental component of any deep learning data pipeline. They are functions that take in a data sample (e.g., an image, a piece of text, or an audio clip) and return a modified version of it.

This process is essential for two primary reasons:

Preprocessing: To convert data into a format that the neural network can accept. This includes resizing images to a fixed size, normalizing pixel values, or converting text tokens into numerical IDs.
Data Augmentation: To artificially increase the diversity of the training dataset by applying random transformations (like random rotations or flips). This is a powerful regularization technique that helps the model generalize better to unseen data.

xTorch provides an extensive library of transforms for a wide variety of data types, mirroring and significantly extending the functionality found in popular Python libraries like torchvision.transforms.

`xt::transforms::Compose`

A single transform performs one operation. To build a complete preprocessing or augmentation pipeline, you need to chain multiple transforms together. This is the job of xt::transforms::Compose.

Compose takes a list of transform modules and applies them sequentially to the data.

General Usage

The standard workflow is to create a Compose object containing a list of the desired transform instances. This Compose object is then passed to a Dataset during its construction. The dataset will automatically apply this pipeline to each data sample it retrieves.

#include <xtorch/xtorch.h>
#include <iostream>
 
int main() {
    // --- 1. Create a list of transform instances ---
    // This pipeline performs common data augmentation for image classification.
    std::vector<std::shared_ptr<xt::Module>> transform_list;
    transform_list.push_back(
        std::make_shared<xt::transforms::image::Resize>(std::vector<int64_t>{256, 256})
    );
    transform_list.push_back(
        std::make_shared<xt::transforms::image::RandomCrop>(std::vector<int64_t>{224, 224})
    );
    transform_list.push_back(
        std::make_shared<xt::transforms::image::RandomHorizontalFlip>(/*p=*/0.5)
    );
    transform_list.push_back(
        std::make_shared<xt::transforms::general::Normalize>(
            std::vector<float>{0.485, 0.456, 0.406}, // Mean for ImageNet
            std::vector<float>{0.229, 0.224, 0.225}  // Std for ImageNet
        )
    );
 
    // --- 2. Create the Compose object ---
    auto transform_pipeline = std::make_unique<xt::transforms::Compose>(transform_list);
 
    // --- 3. Pass the pipeline to a Dataset ---
    // The ImageFolderDataset will now apply these augmentations to every image it loads.
    auto dataset = xt::datasets::ImageFolderDataset(
        "/path/to/your/image/data/",
        std::move(transform_pipeline)
    );
 
    // The rest of the workflow (DataLoader, Trainer) remains the same.
    xt::dataloaders::ExtendedDataLoader data_loader(dataset, 32);
    // ...
}

Transforms by Data Modality

The xTorch transform library is organized by the type of data it operates on. Follow the links below for a detailed list of available transforms in each category.

Appliers: Meta-transforms that control how other transforms are applied, such as RandomApply or OneOf.
Image: The largest collection, containing transforms for geometric adjustments (resize, crop, rotate, flip), color jittering, normalization, and advanced augmentations like Cutout and MixUp.
Signal (Audio): Transforms for processing audio waveforms, including creating spectrograms (MelSpectrogram), applying time and frequency masking, and changing pitch or speed.
Text: Transforms for NLP, including tokenizers (BertTokenizer, SentencePieceTokenizer), and utilities for padding and truncating sequences.
Graph: Transforms for augmenting graph-structured data, such as dropping nodes (NodeDrop) or edges (EdgeDrop).
Video: Transforms for processing sequences of images, such as temporal subsampling.
Weather: A unique collection of transforms that can simulate various weather conditions (rain, fog, snow) on images, useful for training robust autonomous driving models.