Image Transforms
Image transforms are essential for nearly every computer vision task. They are used for preprocessing (e.g., resizing images to a uniform size, normalizing pixel values) and, critically, for data augmentation (e.g., applying random rotations, crops, and color shifts to the training data). Data augmentation is a key technique for preventing overfitting and improving model robustness.
xTorch provides a massive library of image transforms, from basic operations to advanced, state-of-the-art augmentation techniques.
All image transforms are located under the xt::transforms::image namespace and their headers can be found in the <xtorch/transforms/image/> directory.
General Usage
The most common way to use image transforms is to chain them together into a pipeline using xt::transforms::Compose. This pipeline is then passed to your Dataset during construction, which will apply the transformations to each image as it is loaded.
#include <xtorch/xtorch.h>
#include <iostream>
int main() {
// This is a standard data augmentation pipeline for training on ImageNet
auto training_transforms = std::make_unique<xt::transforms::Compose>(
// Resize the smaller edge to 256, maintaining aspect ratio
std::make_shared<xt::transforms::image::Resize>(256),
// Randomly crop a 224x224 patch
std::make_shared<xt::transforms::image::RandomCrop>(std::vector<int64_t>{224, 224}),
// Randomly flip the image horizontally with a 50% probability
std::make_shared<xt::transforms::image::RandomHorizontalFlip>(/*p=*/0.5),
// Apply some color jitter
std::make_shared<xt::transforms::image::ColorJitter>(),
// Normalize the image using ImageNet mean and stddev
std::make_shared<xt::transforms::general::Normalize>(
std::vector<float>{0.485, 0.456, 0.406},
std::vector<float>{0.229, 0.224, 0.225}
)
);
// This pipeline would be passed to a dataset
// auto dataset = xt::datasets::ImageFolderDataset("./data", std::move(training_transforms));
}!!! info "Constructor Options"
Nearly all transforms are configurable through their constructors. This includes parameters like sizes, probabilities (p), rotation degrees, and more. Always refer to the specific header file in <xtorch/transforms/image/> for a full list of available settings.
Available Transforms by Category
Geometric Transforms
These transforms alter the spatial properties of the image.
| Transform | Description |
|---|---|
Resize |
Resizes the input image to a given size. |
Scale |
An alias for Resize. |
LongestMaxSize |
Resizes the longest edge to a max size, maintaining aspect ratio. |
SmallestMaxSize |
Resizes the smallest edge to a max size, maintaining aspect ratio. |
Crop |
Crops the image at a specified location and size. |
CenterCrop |
Crops the central part of the image. |
RandomCrop |
Crops a random part of the image. |
RandomResizedCrop |
Crops a random part of the image and resizes it to a specific size. A common training augmentation. |
Flip |
Flips the image vertically, horizontally, or both. |
HorizontalFlip |
Flips the image horizontally. |
VerticalFlip |
Flips the image vertically. |
RandomHorizontalFlip |
Randomly flips the image horizontally with a given probability. |
RandomVerticalFlip |
Randomly flips the image vertically with a given probability. |
RandomFlip |
Randomly flips the image horizontally and/or vertically. |
Pad |
Pads the image on all sides with a given value. |
PadIfNeeded |
Pads the image to a minimum height and width. |
Rotation |
Rotates the image by a specified angle. |
RandomRotation |
Rotates the image by a random angle within a given range. |
Affine |
Applies a general affine transformation to the image. |
RandomAffine |
Applies a random affine transformation. |
Perspective |
Applies a perspective transformation. |
RandomPerspective |
Applies a random perspective transformation. |
ElasticTransform |
Applies an elastic deformation to the image. |
GridDistortion |
Applies a grid distortion effect. |
OpticalDistortion |
Applies an optical barrel/pincushion distortion. |
Color & Photometric Transforms
These transforms alter the pixel values, colors, brightness, and contrast of the image.
| Transform | Description |
|---|---|
ColorJitter |
Randomly changes the brightness, contrast, saturation, and hue of an image. |
RandomBrightnessContrast |
Randomly changes the brightness and contrast. |
Grayscale |
Converts the image to grayscale. |
RandomGrayscale |
Randomly converts the image to grayscale with a given probability. |
Posterize |
Reduces the number of bits for each color channel. |
RandomPosterize |
Randomly applies posterization. |
Solarize |
Inverts all pixel values above a threshold. |
RandomSolarize |
Randomly applies solarization. |
Invert |
Inverts the colors of the image. |
RandomInvert |
Randomly inverts the colors. |
Equalize |
Applies histogram equalization to the image. |
RandomEqualize |
Randomly applies histogram equalization. |
CLAHE |
Applies Contrast Limited Adaptive Histogram Equalization. |
ChannelShuffle |
Randomly shuffles the color channels of the image. |
RandomGamma |
Applies random gamma correction. |
RandomAdjustSharpness |
Randomly adjusts the sharpness of the image. |
RandomAutoContrast |
Randomly applies automatic contrast adjustment. |
FancyPCA |
Applies PCA-based color augmentation. |
Augmentation & Erasing Transforms
These are advanced augmentation techniques that often involve erasing or mixing parts of images.
| Transform | Description |
|---|---|
Cutout |
Randomly erases one or more rectangular patches from an image. |
CoarseDropout |
An alternative name for Cutout. |
GridDropout |
Erases a grid of patches from an image. |
MaskDropout |
Applies dropout to a mask. |
MixUp |
Creates a new image by taking a weighted linear interpolation of two images. |
CutMix |
Creates a new image by cutting a patch from one image and pasting it onto another. |
RandomMosaic |
Combines four images into a single mosaic. |
RandomAugment |
Automatically applies a sequence of randomly selected augmentations (similar to AutoAugment). |
GridShuffle |
Shuffles patches of the image arranged in a grid. |
Blur & Noise Transforms
| Transform | Description |
|---|---|
Blur |
Blurs the image using a normalized box filter. |
GaussianBlur |
Blurs the image using a Gaussian filter. |
MedianBlur |
Blurs the image using a median filter. |
MotionBlur |
Applies motion blur to the image. |
| GlassBlur | Applies a glass-like blur effect. |
| ZoomBlur | Applies a blur that simulates zooming. |
| GaussianNoise| Adds Gaussian noise to the image. |
| NoiseInjection| Injects random noise into the image. |
Stylistic & Filter-Based Transforms
| Transform | Description |
|---|---|
Sharpen |
Sharpens the image. |
Emboss |
Applies an embossing filter to the image. |
ToSepia |
Applies a sepia filter to the image. |
BlackWhite |
Converts the image to black and white. |
Spatter |
Adds a "spatter" effect to the image, like drops on a camera lens. |
