Section Navigation

▼ Api
- Index
- Activations
- Dataloaders
- Dropouts
- Losses
- Normalizations
- Optimizations
- Regularizations
- Trainers
- Utils
- ▼ Datasets
- ▼ Models
- ▼ Transforms
▼ Comparisons
- Comparison
▼ Examples
- Index
- ▼ Audio
  - ▼ Audio classification
    Environmental sounds
    Music genre
  - ▼ Speech recognition
    E2e ctc
    Keyword spotting
- ▼ Computer vision
  - ▼ Image classification
    Finetuning resnet cifar10
    Lenet mnist
    Transfer learning custom
  - ▼ Image generation
    Cyclegan
    Dcgan
  - ▼ Object detection
    Faster rcnn
    Yolov3 coco
  - ▼ Semantic segmentation
    Deeplab v3
    Mask rcnn
- ▼ Data handling
  - ▼ Dataloaders
    Efficient loading
  - ▼ Datasets
    Builtin datasets
    Custom datasets
  - ▼ Transforms
    Image augmentation
- ▼ Deployment
  - ▼ Inference
    Cpp app
    Tensorrt
  - ▼ Serialization
    Export torchscript
    Save load
  - ▼ Web services
    Rest api
- ▼ Distributed
  - ▼ Data parallelism
    Multi gpu
  - ▼ Model parallelism
    Model splitting
  - ▼ Multi machine
    Setup
- ▼ Generative
  - ▼ Autoencoders
    Denoising ae
    Vae
  - ▼ Diffusion
    Ddpm
  - ▼ Gans
    Mnist gan
    Progressive gan
- ▼ Getting started
- ▼ Gnn
  - ▼ Graph level
    Diffpool
    Mpnn
  - ▼ Node level
    Gcn
    Graphsage
- ▼ Nlp
  - ▼ Language modeling
    Finetuning bert
    Training gpt
  - ▼ Seq2seq
    Machine translation
    Summarization
  - ▼ Text classification
    Sentiment rnn
    Transformer classification
- ▼ Optimization
  - ▼ Lr schedulers
    Cosine annealing
    Step decay
  - ▼ Optimizers
    Adamw
    Sgd momentum
  - ▼ Regularization
    Dropout
    Weight decay
- ▼ Performance
  - ▼ Memory
    Data loading
    Gradient checkpointing
  - ▼ Speed
    Mixed precision
    Profiling
- ▼ Rl
  - ▼ Policy based
    Ppo
    Reinforce
  - ▼ Value based
    Dqn atari
    Q learning
- ▼ Time series
  - ▼ Anomaly detection
    Autoencoders
  - ▼ Forecasting
    Lstm
    Multivariate
▼ Getting started
- Installation
- Quick start cnn
▼ User guide

Natural Language Processing (NLP) Models

The field of Natural Language Processing has been revolutionized by deep learning, particularly by the advent of the Transformer architecture. To empower developers to build state-of-the-art NLP applications, xTorch provides a comprehensive zoo of pre-built models, ranging from classic recurrent architectures to a wide variety of modern Transformers.

All NLP models are located under the xt::models namespace and their headers can be found in the <xtorch/models/natural_language_processing/> directory.

General Usage

NLP models do not operate on raw text. Instead, they require the input text to be preprocessed into a numerical format. This typically involves:

Tokenization: Breaking the text into sub-word units (tokens).
Numericalization: Converting each token into a unique integer ID from a vocabulary.
Formatting: Adding special tokens (like [CLS], [SEP]), creating an attention mask to handle padding, and arranging the data into tensors.

The forward pass of a typical Transformer-based model takes these tensors as input.

#include <xtorch/xtorch.h>
#include <iostream>
 
int main() {
    torch::Device device(torch::cuda::is_available() ? torch::kCUDA : torch::kCPU);
 
    // --- Model Configuration (Example for a BERT-like model) ---
    const int vocab_size = 30522;
    const int hidden_size = 768;
    const int num_attention_heads = 12;
    const int num_hidden_layers = 12;
    const int max_position_embeddings = 512;
 
    // --- Instantiate a BERT Model ---
    xt::models::BERT model(
        vocab_size,
        hidden_size,
        num_hidden_layers,
        num_attention_heads,
        max_position_embeddings
    );
    model.to(device);
    model.train();
 
    std::cout << "BERT Model Instantiated." << std::endl;
 
    // --- Create Dummy Input Data ---
    const int batch_size = 8;
    const int sequence_length = 128;
 
    // Batch of token IDs
    auto input_ids = torch::randint(0, vocab_size, {batch_size, sequence_length}).to(device);
    // Attention mask to indicate which tokens are real vs. padding
    auto attention_mask = torch::ones({batch_size, sequence_length}).to(device);
 
    // --- Perform a Forward Pass ---
    auto output = model.forward(input_ids, attention_mask);
    // Output is often a tuple or struct containing last_hidden_state, pooler_output, etc.
    auto last_hidden_state = output.last_hidden_state;
 
    std::cout << "Output hidden state shape: " << last_hidden_state.sizes() << std::endl;
}

Available Models by Family

Transformer Architectures

This is the largest and most powerful family of models, forming the basis of modern NLP.

Model Family	Description	Header File
`BERT`	Bidirectional Encoder Representations from Transformers, a powerful pre-trained encoder model.	`transformers/bert.h`
`RoBERTa`	A Robustly Optimized BERT Pretraining Approach.	`transformers/roberta.h`
`ALBERT`	A Lite BERT for Self-supervised Learning of Language Representations.	`transformers/albert.h`
`DistilBERT`	A smaller, faster, and lighter version of BERT, trained using knowledge distillation.	`transformers/distil_bert.h`
`ELECTRA`	A pre-training method that is more efficient than masked language modeling.	`transformers/electra.h`
`GPT`	Generative Pre-trained Transformer, a family of powerful auto-regressive language models.	`transformers/gpt.h`
`Llama`	A family of large language models released by Meta AI.	`transformers/llama.h`
`Mistral`	A family of high-performance large language models.	`transformers/mistral.h`
`Grok`	The open-source version of xAI's large language model.	`transformers/grok.h`
`DeepSeek`	A family of open-source LLMs from DeepSeek AI.	`transformers/deepseek.h`
`T5`	Text-To-Text Transfer Transformer, which frames all NLP tasks as a text-to-text problem.	`transformers/t5.h`
`BART`	A denoising autoencoder for pretraining sequence-to-sequence models.	`transformers/bart.h`
`XLNet`	A generalized autoregressive pretraining method that combines ideas from autoregressive and autoencoding models.	`transformers/xlnet.h`
`Longformer`	A Transformer variant with an attention mechanism that scales linearly with sequence length.	`transformers/long_former.h`
`Reformer`	An efficient Transformer variant that uses locality-sensitive hashing.	`transformers/reformer.h`
`BigBird`	A sparse attention mechanism that can handle long sequences.	`transformers/big_bird.h`

Recurrent Architectures (RNNs)

These models process sequences step-by-step and are foundational to sequence-based tasks.

Model	Description	Header File
`Seq2Seq`	A standard sequence-to-sequence model using an Encoder-Decoder architecture with RNNs.	`rnn/seq2seq.h`
`AttentionBasedSeq2Seq`	An extension of Seq2Seq that incorporates an attention mechanism to improve performance.	`rnn/attention_based_seq2seq.h`

Other & Classic Models

These models are primarily used for learning static word embeddings.

Model	Description	Header File
`Word2Vec`	A classic model that learns word associations from a large corpus of text.	`others/word2vec.h`
`GloVe`	Global Vectors for Word Representation, an unsupervised learning algorithm for obtaining vector representations for words.	`others/glove.h`
`FastText`	An extension of Word2Vec that learns vectors for n-grams of characters, allowing it to handle out-of-vocabulary words.	`others/fast_text.h`
`ELMo`	Embeddings from Language Models, a deep contextualized word representation.	`others/elmo.h`