Hey there! We’re Aryorithm

xTorch: The C++ Toolkit for LibTorch You've Been Waiting For.

Write cleaner, more intuitive C++ code for your PyTorch models.

View xTorch Explore xInfer

Less Code, More Clarity.

Standard LibTorch

Verbose, Low-Level APIs that require significant boilerplate for fundamental tasks like model loading and device management.
Fragmented Data Pipelines forcing developers to mix libraries and perform manual tensor math for pre-processing.
Raw, Unstructured Outputs where models return basic tensors that require manual parsing of scores and indices.
Manual GPU Resource Management that makes the developer responsible for all complex CUDA memory allocation and data transfers.
Complex Extensibility where integrating custom CUDA kernels requires fighting the difficult native TensorRT Plugin API.
The 'Last Mile' Deployment Gap where there is no native path to expose a model as a web service without a massive DevOps effort.

With XTtorch

Expressive, High-Level API with single-line functions for complex operations.
Integrated Data Processing with fluent, chainable pipelines for transforming data.
C++ Idiomatic Results via utilities that return clean, structured std::vectors of prediction objects.
Automated GPU Memory Management which handles all CUDA boilerplate behind a simple .infer() call.
Simplified Plugin API providing a streamlined interface to register and use your own custom CUDA kernels.
Instant Cloud Deployment through an optional service that converts any TensorRT engine into a scalable REST API.

Effortless Power

Simplified Workflow

Our intuitive APIs reduce boilerplate and let you focus on logic, not setup.

GPU-Accelerated

Built from the ground up for high-throughput inference on NVIDIA GPUs.

Embedding Systems

Use C++ performance inside any type of embedding systems like Nvidia Jetson

Get Started in Minutes

main.cpp
#include <xtorch/xtorch.h>
#include <iostream>

int main() {
    std::cout.precision(10);
    std::vector<std::shared_ptr<xt::Module>> transform_list;
    transform_list.push_back(std::make_shared<xt::transforms::image::Resize>(std::vector<int64_t>{32, 32}));
    transform_list.push_back(
        std::make_shared<xt::transforms::general::Normalize>(std::vector<float>{0.5}, std::vector<float>{0.5}));
    auto compose = std::make_unique<xt::transforms::Compose>(transform_list);
    auto dataset = xt::datasets::MNIST("/home/kami/Documents/datasets/", xt::datasets::DataMode::TRAIN, false,
                                       std::move(compose));
    xt::dataloaders::ExtendedDataLoader data_loader(dataset, 64, true, 2, /*prefetch_factor=*/2);
    xt::models::LeNet5 model(10);
    model.to(torch::Device(torch::kCPU));
    model.train();
    torch::optim::Adam optimizer(model.parameters(), torch::optim::AdamOptions(1e-3));
    auto logger = std::make_shared<xt::LoggingCallback>("[MyTrain]", /*log_every_N_batches=*/20, /*log_time=*/true);
    xt::Trainer trainer;
    trainer.set_max_epochs(10).set_optimizer(optimizer)
           .set_loss_fn([](const auto& output, const auto& target)
           {
               return torch::nll_loss(output, target);
           })
           .add_callback(logger);
    trainer.fit(model, data_loader, &data_loader, torch::Device(torch::kCPU));
    return 0;
}