Section Navigation

▼ Api
- Index
- Activations
- Dataloaders
- Dropouts
- Losses
- Normalizations
- Optimizations
- Regularizations
- Trainers
- Utils
- ▼ Datasets
- ▼ Models
- ▼ Transforms
▼ Comparisons
- Comparison
▼ Examples
- Index
- ▼ Audio
  - ▼ Audio classification
    Environmental sounds
    Music genre
  - ▼ Speech recognition
    E2e ctc
    Keyword spotting
- ▼ Computer vision
  - ▼ Image classification
    Finetuning resnet cifar10
    Lenet mnist
    Transfer learning custom
  - ▼ Image generation
    Cyclegan
    Dcgan
  - ▼ Object detection
    Faster rcnn
    Yolov3 coco
  - ▼ Semantic segmentation
    Deeplab v3
    Mask rcnn
- ▼ Data handling
  - ▼ Dataloaders
    Efficient loading
  - ▼ Datasets
    Builtin datasets
    Custom datasets
  - ▼ Transforms
    Image augmentation
- ▼ Deployment
  - ▼ Inference
    Cpp app
    Tensorrt
  - ▼ Serialization
    Export torchscript
    Save load
  - ▼ Web services
    Rest api
- ▼ Distributed
  - ▼ Data parallelism
    Multi gpu
  - ▼ Model parallelism
    Model splitting
  - ▼ Multi machine
    Setup
- ▼ Generative
  - ▼ Autoencoders
    Denoising ae
    Vae
  - ▼ Diffusion
    Ddpm
  - ▼ Gans
    Mnist gan
    Progressive gan
- ▼ Getting started
- ▼ Gnn
  - ▼ Graph level
    Diffpool
    Mpnn
  - ▼ Node level
    Gcn
    Graphsage
- ▼ Nlp
  - ▼ Language modeling
    Finetuning bert
    Training gpt
  - ▼ Seq2seq
    Machine translation
    Summarization
  - ▼ Text classification
    Sentiment rnn
    Transformer classification
- ▼ Optimization
  - ▼ Lr schedulers
    Cosine annealing
    Step decay
  - ▼ Optimizers
    Adamw
    Sgd momentum
  - ▼ Regularization
    Dropout
    Weight decay
- ▼ Performance
  - ▼ Memory
    Data loading
    Gradient checkpointing
  - ▼ Speed
    Mixed precision
    Profiling
- ▼ Rl
  - ▼ Policy based
    Ppo
    Reinforce
  - ▼ Value based
    Dqn atari
    Q learning
- ▼ Time series
  - ▼ Anomaly detection
    Autoencoders
  - ▼ Forecasting
    Lstm
    Multivariate
▼ Getting started
- Installation
- Quick start cnn
▼ User guide

Image Transforms

Image transforms are essential for nearly every computer vision task. They are used for preprocessing (e.g., resizing images to a uniform size, normalizing pixel values) and, critically, for data augmentation (e.g., applying random rotations, crops, and color shifts to the training data). Data augmentation is a key technique for preventing overfitting and improving model robustness.

xTorch provides a massive library of image transforms, from basic operations to advanced, state-of-the-art augmentation techniques.

All image transforms are located under the xt::transforms::image namespace and their headers can be found in the <xtorch/transforms/image/> directory.

General Usage

The most common way to use image transforms is to chain them together into a pipeline using xt::transforms::Compose. This pipeline is then passed to your Dataset during construction, which will apply the transformations to each image as it is loaded.

#include <xtorch/xtorch.h>
#include <iostream>
 
int main() {
    // This is a standard data augmentation pipeline for training on ImageNet
    auto training_transforms = std::make_unique<xt::transforms::Compose>(
        // Resize the smaller edge to 256, maintaining aspect ratio
        std::make_shared<xt::transforms::image::Resize>(256),
        // Randomly crop a 224x224 patch
        std::make_shared<xt::transforms::image::RandomCrop>(std::vector<int64_t>{224, 224}),
        // Randomly flip the image horizontally with a 50% probability
        std::make_shared<xt::transforms::image::RandomHorizontalFlip>(/*p=*/0.5),
        // Apply some color jitter
        std::make_shared<xt::transforms::image::ColorJitter>(),
        // Normalize the image using ImageNet mean and stddev
        std::make_shared<xt::transforms::general::Normalize>(
            std::vector<float>{0.485, 0.456, 0.406},
            std::vector<float>{0.229, 0.224, 0.225}
        )
    );
 
    // This pipeline would be passed to a dataset
    // auto dataset = xt::datasets::ImageFolderDataset("./data", std::move(training_transforms));
}

!!! info "Constructor Options" Nearly all transforms are configurable through their constructors. This includes parameters like sizes, probabilities (p), rotation degrees, and more. Always refer to the specific header file in <xtorch/transforms/image/> for a full list of available settings.

Available Transforms by Category

Geometric Transforms

These transforms alter the spatial properties of the image.

Transform	Description
`Resize`	Resizes the input image to a given size.
`Scale`	An alias for `Resize`.
`LongestMaxSize`	Resizes the longest edge to a max size, maintaining aspect ratio.
`SmallestMaxSize`	Resizes the smallest edge to a max size, maintaining aspect ratio.
`Crop`	Crops the image at a specified location and size.
`CenterCrop`	Crops the central part of the image.
`RandomCrop`	Crops a random part of the image.
`RandomResizedCrop`	Crops a random part of the image and resizes it to a specific size. A common training augmentation.
`Flip`	Flips the image vertically, horizontally, or both.
`HorizontalFlip`	Flips the image horizontally.
`VerticalFlip`	Flips the image vertically.
`RandomHorizontalFlip`	Randomly flips the image horizontally with a given probability.
`RandomVerticalFlip`	Randomly flips the image vertically with a given probability.
`RandomFlip`	Randomly flips the image horizontally and/or vertically.
`Pad`	Pads the image on all sides with a given value.
`PadIfNeeded`	Pads the image to a minimum height and width.
`Rotation`	Rotates the image by a specified angle.
`RandomRotation`	Rotates the image by a random angle within a given range.
`Affine`	Applies a general affine transformation to the image.
`RandomAffine`	Applies a random affine transformation.
`Perspective`	Applies a perspective transformation.
`RandomPerspective`	Applies a random perspective transformation.
`ElasticTransform`	Applies an elastic deformation to the image.
`GridDistortion`	Applies a grid distortion effect.
`OpticalDistortion`	Applies an optical barrel/pincushion distortion.

Color & Photometric Transforms

These transforms alter the pixel values, colors, brightness, and contrast of the image.

Transform	Description
`ColorJitter`	Randomly changes the brightness, contrast, saturation, and hue of an image.
`RandomBrightnessContrast`	Randomly changes the brightness and contrast.
`Grayscale`	Converts the image to grayscale.
`RandomGrayscale`	Randomly converts the image to grayscale with a given probability.
`Posterize`	Reduces the number of bits for each color channel.
`RandomPosterize`	Randomly applies posterization.
`Solarize`	Inverts all pixel values above a threshold.
`RandomSolarize`	Randomly applies solarization.
`Invert`	Inverts the colors of the image.
`RandomInvert`	Randomly inverts the colors.
`Equalize`	Applies histogram equalization to the image.
`RandomEqualize`	Randomly applies histogram equalization.
`CLAHE`	Applies Contrast Limited Adaptive Histogram Equalization.
`ChannelShuffle`	Randomly shuffles the color channels of the image.
`RandomGamma`	Applies random gamma correction.
`RandomAdjustSharpness`	Randomly adjusts the sharpness of the image.
`RandomAutoContrast`	Randomly applies automatic contrast adjustment.
`FancyPCA`	Applies PCA-based color augmentation.

Augmentation & Erasing Transforms

These are advanced augmentation techniques that often involve erasing or mixing parts of images.

Transform	Description
`Cutout`	Randomly erases one or more rectangular patches from an image.
`CoarseDropout`	An alternative name for `Cutout`.
`GridDropout`	Erases a grid of patches from an image.
`MaskDropout`	Applies dropout to a mask.
`MixUp`	Creates a new image by taking a weighted linear interpolation of two images.
`CutMix`	Creates a new image by cutting a patch from one image and pasting it onto another.
`RandomMosaic`	Combines four images into a single mosaic.
`RandomAugment`	Automatically applies a sequence of randomly selected augmentations (similar to AutoAugment).
`GridShuffle`	Shuffles patches of the image arranged in a grid.

Blur & Noise Transforms

Transform	Description
`Blur`	Blurs the image using a normalized box filter.
`GaussianBlur`	Blurs the image using a Gaussian filter.
`MedianBlur`	Blurs the image using a median filter.
`MotionBlur`	Applies motion blur to the image.

Stylistic & Filter-Based Transforms

Transform	Description
`Sharpen`	Sharpens the image.
`Emboss`	Applies an embossing filter to the image.
`ToSepia`	Applies a sepia filter to the image.
`BlackWhite`	Converts the image to black and white.
`Spatter`	Adds a "spatter" effect to the image, like drops on a camera lens.