Section Navigation

▼ Api
- Index
- Activations
- Dataloaders
- Dropouts
- Losses
- Normalizations
- Optimizations
- Regularizations
- Trainers
- Utils
- ▼ Datasets
- ▼ Models
- ▼ Transforms
▼ Comparisons
- Comparison
▼ Examples
- Index
- ▼ Audio
  - ▼ Audio classification
    Environmental sounds
    Music genre
  - ▼ Speech recognition
    E2e ctc
    Keyword spotting
- ▼ Computer vision
  - ▼ Image classification
    Finetuning resnet cifar10
    Lenet mnist
    Transfer learning custom
  - ▼ Image generation
    Cyclegan
    Dcgan
  - ▼ Object detection
    Faster rcnn
    Yolov3 coco
  - ▼ Semantic segmentation
    Deeplab v3
    Mask rcnn
- ▼ Data handling
  - ▼ Dataloaders
    Efficient loading
  - ▼ Datasets
    Builtin datasets
    Custom datasets
  - ▼ Transforms
    Image augmentation
- ▼ Deployment
  - ▼ Inference
    Cpp app
    Tensorrt
  - ▼ Serialization
    Export torchscript
    Save load
  - ▼ Web services
    Rest api
- ▼ Distributed
  - ▼ Data parallelism
    Multi gpu
  - ▼ Model parallelism
    Model splitting
  - ▼ Multi machine
    Setup
- ▼ Generative
  - ▼ Autoencoders
    Denoising ae
    Vae
  - ▼ Diffusion
    Ddpm
  - ▼ Gans
    Mnist gan
    Progressive gan
- ▼ Getting started
- ▼ Gnn
  - ▼ Graph level
    Diffpool
    Mpnn
  - ▼ Node level
    Gcn
    Graphsage
- ▼ Nlp
  - ▼ Language modeling
    Finetuning bert
    Training gpt
  - ▼ Seq2seq
    Machine translation
    Summarization
  - ▼ Text classification
    Sentiment rnn
    Transformer classification
- ▼ Optimization
  - ▼ Lr schedulers
    Cosine annealing
    Step decay
  - ▼ Optimizers
    Adamw
    Sgd momentum
  - ▼ Regularization
    Dropout
    Weight decay
- ▼ Performance
  - ▼ Memory
    Data loading
    Gradient checkpointing
  - ▼ Speed
    Mixed precision
    Profiling
- ▼ Rl
  - ▼ Policy based
    Ppo
    Reinforce
  - ▼ Value based
    Dqn atari
    Q learning
- ▼ Time series
  - ▼ Anomaly detection
    Autoencoders
  - ▼ Forecasting
    Lstm
    Multivariate
▼ Getting started
- Installation
- Quick start cnn
▼ User guide

Graph Datasets

xTorch provides support for graph-based machine learning tasks with a collection of standard graph datasets. These are essential for developing and benchmarking Graph Neural Networks (GNNs).

Graph datasets are located under the xt::datasets namespace and can be found in the <xtorch/datasets/graph_data/> header directory.

Graph Data Representation

Unlike image or text data, which is typically represented as a pair of (data, target) tensors, graph data has a more complex structure. In xTorch, a graph dataset typically returns a torch::data::Example containing multiple components:

x: A [num_nodes, num_node_features] tensor of node features.
edge_index: A [2, num_edges] tensor representing the graph's connectivity in COO (coordinate) format. Each column is an edge.
y: A tensor of node or graph labels, depending on the task.

The DataLoader for graph data is designed to handle this structure and create mini-batches appropriately.

General Usage

The workflow for using a graph dataset involves instantiating the dataset class and passing it to a data loader. Due to the nature of graph data, complex transformations are less common but still possible.

#include <xtorch/xtorch.h>
 
int main() {
    // 1. Instantiate a dataset for the Cora citation network
    // This dataset is commonly used for node classification.
    auto dataset = xt::datasets::Cora(
        "./data",
        /*download=*/true
    );
 
    // Note: Graph datasets often represent a single large graph.
    // The "size" might be 1, and batching is handled differently by specialized GNN data loaders.
    std::cout << "Cora dataset loaded." << std::endl;
 
    // For demonstration, let's get the single graph object from the dataset
    auto graph_data = dataset.get(0);
    auto node_features = graph_data.data;
    auto edge_index = graph_data.target; // Example structure, might differ per dataset
 
    std::cout << "Node feature shape: " << node_features.sizes() << std::endl;
    std::cout << "Edge index shape: " << edge_index.sizes() << std::endl;
 
    // 2. Pass the dataset to a DataLoader
    // For GNNs, you might use a specialized graph data loader or a standard one with a batch size of 1
    // if you are doing full-graph training.
    xt::dataloaders::ExtendedDataLoader data_loader(dataset, /*batch_size=*/1, /*shuffle=*/false);
 
    // The data loader is now ready for use in a training loop
    for (auto& batch : data_loader) {
        // ... training step with a GNN model ...
    }
}

!!! warning "Graph Batching" Batching multiple graphs into a single larger graph (a common technique in GNNs) is a specialized process. While the ExtendedDataLoader can iterate over datasets, you may need custom collation logic for advanced GNN training scenarios. For full-graph training (where the entire graph is processed at once), a batch size of 1 is appropriate.

Available Datasets by Task

Node Classification

Node classification is the task of predicting a label for each node in a graph, given the labels of some nodes.

Dataset Class	Description	Header File
`Cora`	A citation network dataset where nodes are documents and edges are citation links. The task is to classify each document into one of seven classes.	`node_classification/cora.h`

Graph-Level Tasks (Graph Classification/Regression)

Graph-level tasks involve predicting a single property for an entire graph.

Dataset Class	Description	Header File
`OGBMolHIV`	A molecular property prediction dataset from the Open Graph Benchmark. The task is to predict whether a molecule inhibits HIV virus replication.	`molecular_property_prediction/ogb_mo_ihiv.h`

Knowledge Graph Reasoning

Dataset Class	Description	Header File
`Freebase`	A subset of the Freebase knowledge graph used for link prediction tasks.	`knowledge_graph_reasoning/freebase.h`
`Wikidata5M`	A large-scale knowledge graph distilled from Wikidata and Wikipedia.	`knowledge_graph_reasoning/wikidata_5m.h`