Section Navigation

▼ Api
- Index
- Activations
- Dataloaders
- Dropouts
- Losses
- Normalizations
- Optimizations
- Regularizations
- Trainers
- Utils
- ▼ Datasets
- ▼ Models
- ▼ Transforms
▼ Comparisons
- Comparison
▼ Examples
- Index
- ▼ Audio
  - ▼ Audio classification
    Environmental sounds
    Music genre
  - ▼ Speech recognition
    E2e ctc
    Keyword spotting
- ▼ Computer vision
  - ▼ Image classification
    Finetuning resnet cifar10
    Lenet mnist
    Transfer learning custom
  - ▼ Image generation
    Cyclegan
    Dcgan
  - ▼ Object detection
    Faster rcnn
    Yolov3 coco
  - ▼ Semantic segmentation
    Deeplab v3
    Mask rcnn
- ▼ Data handling
  - ▼ Dataloaders
    Efficient loading
  - ▼ Datasets
    Builtin datasets
    Custom datasets
  - ▼ Transforms
    Image augmentation
- ▼ Deployment
  - ▼ Inference
    Cpp app
    Tensorrt
  - ▼ Serialization
    Export torchscript
    Save load
  - ▼ Web services
    Rest api
- ▼ Distributed
  - ▼ Data parallelism
    Multi gpu
  - ▼ Model parallelism
    Model splitting
  - ▼ Multi machine
    Setup
- ▼ Generative
  - ▼ Autoencoders
    Denoising ae
    Vae
  - ▼ Diffusion
    Ddpm
  - ▼ Gans
    Mnist gan
    Progressive gan
- ▼ Getting started
- ▼ Gnn
  - ▼ Graph level
    Diffpool
    Mpnn
  - ▼ Node level
    Gcn
    Graphsage
- ▼ Nlp
  - ▼ Language modeling
    Finetuning bert
    Training gpt
  - ▼ Seq2seq
    Machine translation
    Summarization
  - ▼ Text classification
    Sentiment rnn
    Transformer classification
- ▼ Optimization
  - ▼ Lr schedulers
    Cosine annealing
    Step decay
  - ▼ Optimizers
    Adamw
    Sgd momentum
  - ▼ Regularization
    Dropout
    Weight decay
- ▼ Performance
  - ▼ Memory
    Data loading
    Gradient checkpointing
  - ▼ Speed
    Mixed precision
    Profiling
- ▼ Rl
  - ▼ Policy based
    Ppo
    Reinforce
  - ▼ Value based
    Dqn atari
    Q learning
- ▼ Time series
  - ▼ Anomaly detection
    Autoencoders
  - ▼ Forecasting
    Lstm
    Multivariate
▼ Getting started
- Installation
- Quick start cnn
▼ User guide

Models

The xt::models module provides a comprehensive "Model Zoo" containing pre-built, ready-to-use implementations of many popular and state-of-the-art deep learning architectures.

This allows you to get started on standard tasks quickly without having to implement well-known models from scratch. It is perfect for benchmarking, transfer learning, and as a starting point for your own custom architectures.

Core Concept

All models in the xt::models namespace are implemented as standard torch::nn::Modules. This means they integrate seamlessly with the entire LibTorch and xTorch ecosystem. You can inspect their parameters, move them between devices, and pass them to any xTorch Trainer or standard LibTorch Optimizer.

General Usage

Using a pre-built model from xTorch is straightforward. The typical workflow is:

Include the header for the specific model you want to use.
Instantiate the model class, providing any necessary configuration (e.g., number of classes).
Move the model to the desired device (CPU or CUDA).
Use the model for training or inference.

#include <xtorch/xtorch.h>
 
int main() {
    // Define the device
    torch::Device device(torch::cuda::is_available() ? torch::kCUDA : torch::kCPU);
    std::cout << "Using device: " << device << std::endl;
 
    // 1. & 2. Instantiate a pre-built ResNet18 model for a 100-class problem
    xt::models::ResNet model(
        xt::models::ResNetImpl::BlockType::BASIC, // BASIC block for ResNet18/34
        {2, 2, 2, 2},                             // Layers in each stage for ResNet18
        /*num_classes=*/100
    );
 
    // 3. Move the model to the GPU
    model.to(device);
 
    // 4. Perform a dummy forward pass
    // Create a random input tensor: Batch size 16, 3 channels, 224x224 image
    auto input = torch::randn({16, 3, 224, 224}).to(device);
    auto output = model.forward(input);
 
    std::cout << "Model instantiated successfully." << std::endl;
    std::cout << "Output shape: " << output.sizes() << std::endl; // Should be
 
    // The model can now be passed to an optimizer and the xt::Trainer
    // torch::optim::Adam optimizer(model.parameters());
    // xt::Trainer trainer;
    // trainer.fit(model, ...);
}

!!! info "Model Variants" Many model families (like ResNet or EfficientNet) have multiple variants (e.g., ResNet18, ResNet50). These are typically configured through constructor arguments. Please refer to the specific model's header file for details on the available options.

Available Models by Domain

The xTorch Model Zoo is organized by machine learning domain. Follow the links below for a detailed list of available architectures in each category.

Computer Vision: Includes classic and modern architectures for image classification (ResNet, VGG, EfficientNet), object detection (YOLO, Faster R-CNN), segmentation (U-Net, DeepLab), and Vision Transformers (ViT, Swin).
Generative Models: Contains implementations of popular generative architectures, including Generative Adversarial Networks (DCGAN, StyleGAN), Variational Autoencoders (VAE), and Diffusion Models (DDPM).
Natural Language Processing: A collection of models for text-based tasks, including recurrent architectures (Seq2Seq) and a wide range of Transformer-based models (BERT, GPT, T5, Llama).
Graph Neural Networks: Implementations of common GNN architectures for learning on graph-structured data, such as GCN, GraphSAGE, and GAT.
Reinforcement Learning: A collection of models and policies for reinforcement learning, including DQN, A3C, and PPO.
Multimodal: Models designed to process and fuse information from multiple data types, such as CLIP (text and image) and ViLBERT.