Section Navigation

▼ Api
- Index
- Activations
- Dataloaders
- Dropouts
- Losses
- Normalizations
- Optimizations
- Regularizations
- Trainers
- Utils
- ▼ Datasets
- ▼ Models
- ▼ Transforms
▼ Comparisons
- Comparison
▼ Examples
- Index
- ▼ Audio
  - ▼ Audio classification
    Environmental sounds
    Music genre
  - ▼ Speech recognition
    E2e ctc
    Keyword spotting
- ▼ Computer vision
  - ▼ Image classification
    Finetuning resnet cifar10
    Lenet mnist
    Transfer learning custom
  - ▼ Image generation
    Cyclegan
    Dcgan
  - ▼ Object detection
    Faster rcnn
    Yolov3 coco
  - ▼ Semantic segmentation
    Deeplab v3
    Mask rcnn
- ▼ Data handling
  - ▼ Dataloaders
    Efficient loading
  - ▼ Datasets
    Builtin datasets
    Custom datasets
  - ▼ Transforms
    Image augmentation
- ▼ Deployment
  - ▼ Inference
    Cpp app
    Tensorrt
  - ▼ Serialization
    Export torchscript
    Save load
  - ▼ Web services
    Rest api
- ▼ Distributed
  - ▼ Data parallelism
    Multi gpu
  - ▼ Model parallelism
    Model splitting
  - ▼ Multi machine
    Setup
- ▼ Generative
  - ▼ Autoencoders
    Denoising ae
    Vae
  - ▼ Diffusion
    Ddpm
  - ▼ Gans
    Mnist gan
    Progressive gan
- ▼ Getting started
- ▼ Gnn
  - ▼ Graph level
    Diffpool
    Mpnn
  - ▼ Node level
    Gcn
    Graphsage
- ▼ Nlp
  - ▼ Language modeling
    Finetuning bert
    Training gpt
  - ▼ Seq2seq
    Machine translation
    Summarization
  - ▼ Text classification
    Sentiment rnn
    Transformer classification
- ▼ Optimization
  - ▼ Lr schedulers
    Cosine annealing
    Step decay
  - ▼ Optimizers
    Adamw
    Sgd momentum
  - ▼ Regularization
    Dropout
    Weight decay
- ▼ Performance
  - ▼ Memory
    Data loading
    Gradient checkpointing
  - ▼ Speed
    Mixed precision
    Profiling
- ▼ Rl
  - ▼ Policy based
    Ppo
    Reinforce
  - ▼ Value based
    Dqn atari
    Q learning
- ▼ Time series
  - ▼ Anomaly detection
    Autoencoders
  - ▼ Forecasting
    Lstm
    Multivariate
▼ Getting started
- Installation
- Quick start cnn
▼ User guide

Image Generation: DCGAN on CelebA

This tutorial dives into the exciting field of generative modeling by demonstrating how to train a Deep Convolutional Generative Adversarial Network (DCGAN). Our goal is to train a model that can generate novel, realistic images of celebrity faces.

We will be using the CelebA dataset and the pre-built xt::models::DCGAN architecture provided by xTorch.

This example is more advanced than a standard classification task because it involves:

Managing two separate models: a Generator and a Discriminator.
Implementing a custom training loop where these two models compete against each other.
Using a specific loss function (BCELoss) and optimizer configuration (Adam with beta1=0.5) that are known to work well for GANs.

The GAN Training Process

A Generative Adversarial Network is trained as a zero-sum game between two competing neural networks:

The Generator (G): Its job is to create realistic-looking images from random noise. It starts by producing garbage but gets better over time.
The Discriminator (D): Its job is to act as a detective, trying to distinguish between "real" images (from the CelebA dataset) and "fake" images created by the Generator.

The training loop alternates between these two players.

Step 1: Train the Discriminator

The Discriminator is a standard binary classifier. Its training is done in two parts:

Real Batch: We show it a batch of real images from the dataset and teach it to classify them as "real" (label = 1).
Fake Batch: The Generator creates a batch of fake images. We show these to the Discriminator and teach it to classify them as "fake" (label = 0).

The gradients from both parts are combined, and the Discriminator's weights are updated.

Step 2: Train the Generator

The Generator's goal is to fool the Discriminator.

The Generator creates a batch of fake images.
These fake images are passed to the Discriminator.
We calculate the Generator's loss based on how well it tricked the Discriminator. Specifically, the Generator wants the Discriminator to classify its fake images as "real" (label = 1).
We then compute gradients and update only the Generator's weights.

This adversarial process forces the Generator to produce increasingly realistic images to keep up with the improving Discriminator.

Full C++ Code

Below is the complete source code for training the DCGAN. The original file can be found at computer_vision/image_generation/generating_images_with_dcgan.cpp.

#include <xtorch/xtorch.h>
#include <iostream>
#include <chrono>
 
int main() {
    try {
        // --- 1. Hyperparameters ---
        const int latent_vector_size = 100; // Size of input noise vector
        const int generator_feature_maps = 64;
        const int discriminator_feature_maps = 64;
        const int num_channels = 3;
        const int num_epochs = 5;
        const int batch_size = 128;
        const double lr = 0.0002;
        const double beta1 = 0.5;
        const std::vector<int64_t> image_size = {64, 64};
 
        // --- 2. Setup Device, Models, and Optimizers ---
        torch::Device device(torch::cuda::is_available() ? torch::kCUDA : torch::kCPU);
        std::cout << "Using device: " << (device.is_cuda() ? "CUDA" : "CPU") << std::endl;
 
        // Initialize Generator and Discriminator from the xTorch model zoo
        xt::models::DCGAN::Generator netG(latent_vector_size, generator_feature_maps, num_channels);
        xt::models::DCGAN::Discriminator netD(num_channels, discriminator_feature_maps);
        netG.to(device);
        netD.to(device);
 
        // Setup Adam optimizers for both models
        torch::optim::Adam optimG(netG.parameters(), torch::optim::AdamOptions(lr).betas({beta1, 0.999}));
        torch::optim::Adam optimD(netD.parameters(), torch::optim::AdamOptions(lr).betas({beta1, 0.999}));
 
        // Loss function
        torch::nn::BCELoss criterion;
 
        // --- 3. Data Pipeline ---
        auto transforms = std::make_unique<xt::transforms::Compose>(
            std::make_shared<xt::transforms::image::Resize>(image_size),
            std::make_shared<xt::transforms::image::CenterCrop>(image_size),
            // Normalize images to the [-1, 1] range, typical for GANs
            std::make_shared<xt::transforms::general::Normalize>({0.5, 0.5, 0.5}, {0.5, 0.5, 0.5})
        );
 
        auto dataset = xt::datasets::CelebA(
            "/path/to/your/datasets/", // IMPORTANT: Change this path
            xt::datasets::DataMode::TRAIN,
            /*download=*/true,
            std::move(transforms)
        );
        xt::dataloaders::ExtendedDataLoader data_loader(dataset, batch_size, true, 4, 2);
 
        // --- 4. The GAN Training Loop ---
        std::cout << "\nStarting GAN training loop..." << std::endl;
        auto start_time = std::chrono::steady_clock::now();
 
        for (int epoch = 0; epoch < num_epochs; ++epoch) {
            int batch_idx = 0;
            for (auto& batch : data_loader) {
                // ------------ Part 1: Train the Discriminator ------------
                netD.zero_grad();
                // 1a. Train with a real batch
                auto real_data = batch.first.to(device);
                auto current_batch_size = real_data.size(0);
                auto real_labels = torch::full({current_batch_size}, 1.0, torch::kFloat).to(device);
 
                auto output = torch::sigmoid(netD.forward(real_data)).view(-1);
                auto errD_real = criterion(output, real_labels);
                errD_real.backward();
 
                // 1b. Train with a fake batch
                auto noise = torch::randn({current_batch_size, latent_vector_size, 1, 1}).to(device);
                auto fake_data = netG.forward(noise);
                auto fake_labels = torch::full({current_batch_size}, 0.0, torch::kFloat).to(device);
 
                output = torch::sigmoid(netD.forward(fake_data.detach())).view(-1);
                auto errD_fake = criterion(output, fake_labels);
                errD_fake.backward();
 
                // Update the discriminator
                auto errD = errD_real + errD_fake;
                optimD.step();
 
                // ------------ Part 2: Train the Generator ------------
                netG.zero_grad();
                // The generator's goal is to make the discriminator think its fakes are real (label=1)
                output = torch::sigmoid(netD.forward(fake_data)).view(-1);
                auto errG = criterion(output, real_labels);
                errG.backward();
 
                // Update the generator
                optimG.step();
 
                if (++batch_idx % 50 == 0) {
                    std::cout << "Epoch [" << epoch + 1 << "/" << num_epochs << "] Batch [" << batch_idx << "/"
                              << *dataset.size() / batch_size << "] D_Loss: " << errD.item<float>()
                              << " G_Loss: " << errG.item<float>() << std::endl;
                }
            }
        }
        auto end_time = std::chrono::steady_clock::now();
        auto duration_ms = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time);
        std::cout << "\nTotal training duration: " << duration_ms.count() << " milliseconds." << std::endl;
 
    } catch (const std::exception& e) {
        std::cerr << "Error: " << e.what() << std::endl;
        return 1;
    }
    return 0;
}

How to Compile and Run

This example can be found in the xtorch-examples repository.

Navigate to the computer_vision/image_generation/ directory.
Build using CMake:
```
mkdir build
cd build
cmake ..
make
```
Run the executable:
```
./generate_images_dcgan
```

Expected Output

You will see the progress of the training loop, with the Discriminator Loss (D_Loss) and Generator Loss (G_Loss) printed to the console. Ideally, these two losses should remain in a rough equilibrium; if one drops to zero, the other network stops learning.

Using device: CUDA
Starting GAN training loop...
Epoch [1/5] Batch [50/1583] D_Loss: 0.5432109833 G_Loss: 3.1234567890
Epoch [1/5] Batch [100/1583] D_Loss: 0.4321098765 G_Loss: 3.5678901234
...
Total training duration: 219000 milliseconds.

At the end of training, the generator (netG) will be capable of producing novel images of faces from random noise vectors. You could modify the code to save some generated image samples to disk to visualize the results.