Natural Language Processing (NLP) Models
The field of Natural Language Processing has been revolutionized by deep learning, particularly by the advent of the Transformer architecture. To empower developers to build state-of-the-art NLP applications, xTorch provides a comprehensive zoo of pre-built models, ranging from classic recurrent architectures to a wide variety of modern Transformers.
All NLP models are located under the xt::models namespace and their headers can be found in the <xtorch/models/natural_language_processing/> directory.
General Usage
NLP models do not operate on raw text. Instead, they require the input text to be preprocessed into a numerical format. This typically involves:
- Tokenization: Breaking the text into sub-word units (tokens).
- Numericalization: Converting each token into a unique integer ID from a vocabulary.
- Formatting: Adding special tokens (like
[CLS],[SEP]), creating an attention mask to handle padding, and arranging the data into tensors.
The forward pass of a typical Transformer-based model takes these tensors as input.
#include <xtorch/xtorch.h>
#include <iostream>
int main() {
torch::Device device(torch::cuda::is_available() ? torch::kCUDA : torch::kCPU);
// --- Model Configuration (Example for a BERT-like model) ---
const int vocab_size = 30522;
const int hidden_size = 768;
const int num_attention_heads = 12;
const int num_hidden_layers = 12;
const int max_position_embeddings = 512;
// --- Instantiate a BERT Model ---
xt::models::BERT model(
vocab_size,
hidden_size,
num_hidden_layers,
num_attention_heads,
max_position_embeddings
);
model.to(device);
model.train();
std::cout << "BERT Model Instantiated." << std::endl;
// --- Create Dummy Input Data ---
const int batch_size = 8;
const int sequence_length = 128;
// Batch of token IDs
auto input_ids = torch::randint(0, vocab_size, {batch_size, sequence_length}).to(device);
// Attention mask to indicate which tokens are real vs. padding
auto attention_mask = torch::ones({batch_size, sequence_length}).to(device);
// --- Perform a Forward Pass ---
auto output = model.forward(input_ids, attention_mask);
// Output is often a tuple or struct containing last_hidden_state, pooler_output, etc.
auto last_hidden_state = output.last_hidden_state;
std::cout << "Output hidden state shape: " << last_hidden_state.sizes() << std::endl;
}Available Models by Family
Transformer Architectures
This is the largest and most powerful family of models, forming the basis of modern NLP.
| Model Family | Description | Header File |
|---|---|---|
BERT |
Bidirectional Encoder Representations from Transformers, a powerful pre-trained encoder model. | transformers/bert.h |
RoBERTa |
A Robustly Optimized BERT Pretraining Approach. | transformers/roberta.h |
ALBERT |
A Lite BERT for Self-supervised Learning of Language Representations. | transformers/albert.h |
DistilBERT |
A smaller, faster, and lighter version of BERT, trained using knowledge distillation. | transformers/distil_bert.h |
ELECTRA |
A pre-training method that is more efficient than masked language modeling. | transformers/electra.h |
GPT |
Generative Pre-trained Transformer, a family of powerful auto-regressive language models. | transformers/gpt.h |
Llama |
A family of large language models released by Meta AI. | transformers/llama.h |
Mistral |
A family of high-performance large language models. | transformers/mistral.h |
Grok |
The open-source version of xAI's large language model. | transformers/grok.h |
DeepSeek |
A family of open-source LLMs from DeepSeek AI. | transformers/deepseek.h |
T5 |
Text-To-Text Transfer Transformer, which frames all NLP tasks as a text-to-text problem. | transformers/t5.h |
BART |
A denoising autoencoder for pretraining sequence-to-sequence models. | transformers/bart.h |
XLNet |
A generalized autoregressive pretraining method that combines ideas from autoregressive and autoencoding models. | transformers/xlnet.h |
Longformer |
A Transformer variant with an attention mechanism that scales linearly with sequence length. | transformers/long_former.h |
Reformer |
An efficient Transformer variant that uses locality-sensitive hashing. | transformers/reformer.h |
BigBird |
A sparse attention mechanism that can handle long sequences. | transformers/big_bird.h |
Recurrent Architectures (RNNs)
These models process sequences step-by-step and are foundational to sequence-based tasks.
| Model | Description | Header File |
|---|---|---|
Seq2Seq |
A standard sequence-to-sequence model using an Encoder-Decoder architecture with RNNs. | rnn/seq2seq.h |
AttentionBasedSeq2Seq |
An extension of Seq2Seq that incorporates an attention mechanism to improve performance. | rnn/attention_based_seq2seq.h |
Other & Classic Models
These models are primarily used for learning static word embeddings.
| Model | Description | Header File |
|---|---|---|
Word2Vec |
A classic model that learns word associations from a large corpus of text. | others/word2vec.h |
GloVe |
Global Vectors for Word Representation, an unsupervised learning algorithm for obtaining vector representations for words. | others/glove.h |
FastText |
An extension of Word2Vec that learns vectors for n-grams of characters, allowing it to handle out-of-vocabulary words. | others/fast_text.h |
ELMo |
Embeddings from Language Models, a deep contextualized word representation. | others/elmo.h |
