Optimizers
An optimizer is an algorithm that adapts the neural network's attributes, such as weights and learning rates, to minimize the loss function. The choice of optimizer can have a significant impact on training speed and final model performance.
Standard LibTorch Optimizers
LibTorch provides a robust set of the most common and well-established optimization algorithms, which are suitable for a wide range of tasks.
Common Examples:
torch::optim::SGDtorch::optim::Adamtorch::optim::RMSproptorch::optim::Adagrad
These optimizers are used by passing the model's parameters and an Options struct to their constructor. For a complete guide, please refer to the official PyTorch C++ optimizer documentation.
xTorch Extended Optimizers
The field of optimization is an active area of research, with new and improved algorithms being published regularly. To empower developers and researchers to leverage these advancements, xTorch includes a massive library of modern and specialized optimizers.
This allows you to easily replace Adam with variants like RAdam, AdamW, AdaBelief, or LAMB to see if they improve your model's convergence or generalization.
Usage
All xTorch optimizers are located in the xt::optimizations namespace. They are designed to be a drop-in replacement for standard torch::optim optimizers. You construct them in the same way: by providing the model's parameters and an options struct.
They integrate perfectly with the xt::Trainer.
#include <xtorch/xtorch.hh>
int main() {
// 1. Assume 'model' is an initialized torch::nn::Module
xt::models::LeNet5 model(10);
model.to(torch::kCPU);
// 2. Instantiate an xTorch optimizer
// It takes the model parameters and an Options struct, just like a standard optimizer.
xt::optimizations::RAdam optimizer(
model.parameters(),
xt::optimizations::RAdamOptions(1e-3) // Learning rate of 0.001
);
// 3. Integrate with the xt::Trainer
xt::Trainer trainer;
trainer.set_max_epochs(10)
.set_optimizer(optimizer) // Pass the xTorch optimizer to the trainer
.set_loss_fn(torch::nll_loss);
// The trainer will now use RAdam to update the model's weights.
// trainer.fit(model, data_loader, nullptr, torch::kCPU);
std::cout << "Trainer configured with RAdam optimizer." << std::endl;
}Available Optimizers
Below is the comprehensive list of optimizers available in the xt::optimizations module.
OneBitAdam |
OneBitLamb |
AdaBelief |
AdaBound |
Adafactor |
AdaFisher |
AdaHessian |
AdaMax |
AdamMini |
AdaMod |
AdamW |
AdaShift |
AdaSmooth |
AdaSqrt |
ADOPT |
AggMo |
AMSBound |
AMSGrad |
AO |
Apollo |
Atmo |
DeepEnsembles |
DemonAdam |
DemonCM |
Demon |
DFA |
DiagAdaFisher |
DistributedShampoo |
DSPT |
ECO |
FA |
FASFA |
FATA |
ForwardGradient |
GCANS |
GradientCheckpointing |
GradientSparsification |
Gravity |
HGS |
Info |
KP |
LAMB |
LARS |
LocalSGD |
Lookahead |
MadGrad |
MAS |
MPSO |
Nadam |
NTASGD |
PLO |
PO |
PowerPropagation |
PowerSGD |
QHAdam |
QHM |
RAdam |
SLamb |
SM3 |
SMA |
SRMM |
StochasticWeightAveraging |
YellowFin |
!!! info "Constructor Options"
Each optimizer has its own set of hyperparameters (e.g., lr, betas, eps, weight_decay). These are configured via a dedicated Options struct passed to the constructor. Please refer to the specific header file in <xtorch/optimizations/> for details on the available settings for each optimizer.
