Aryorithm - AI Artificial Intelligence & AI Technology Startups

General Questions

What is the relationship between Ignition-Hub, XTorch, and XInfer?▼

This is the most important question to understand our ecosystem. XTorch is your starting point for converting and optimizing your PyTorch models into TensorRT engines. Ignition-Hub is the cloud platform where you upload that engine file to deploy it as a scalable API. XInfer is our client-side library (SDK) that makes it simple to call your deployed API from your own application. Think of it as a three-step pipeline: Prepare (XTorch) -> Deploy (Ignition-Hub) -> Integrate (XInfer).

XTorch - Model Conversion & Optimization

What is XTorch?▼

XTorch is a command-line tool and Python library designed to streamline the conversion of PyTorch models into optimized TensorRT engines. It intelligently handles the conversion to ONNX and then to TensorRT, applying optimizations like FP16 or INT8 quantization to maximize inference speed.

Do I have to use XTorch to use Ignition-Hub?▼

No, but it is highly recommended. Ignition-Hub accepts any valid TensorRT .engine file. If you have your own complex conversion pipeline, you can absolutely use that. XTorch is provided to make the process easier and more reliable for the 90% of use cases.

What PyTorch model versions are supported?▼

XTorch is designed to work with models from PyTorch 1.8 and newer. We always recommend using the latest stable version of PyTorch for the best results, as ONNX export support improves with each release.

What are FP16 and INT8 quantization, and should I use them?▼

FP16 (Half Precision): This optimization reduces your model's size by half and can significantly speed up inference with minimal loss in accuracy. It's a great default choice.
INT8 (8-bit Integer): This offers the highest performance boost and smallest model size but requires a calibration step with a representative dataset. It can sometimes lead to a noticeable drop in accuracy, so it should be used carefully and validated. XTorch provides tools to help with the calibration process.

I'm getting a conversion error. What are the common causes?▼

Conversion failures are often due to a model containing operations (ops) that are not supported by the ONNX exporter or TensorRT. Common culprits include:

Dynamic shapes or control flow: Models with input sizes that change dramatically or contain complex loops can be difficult to convert.
Custom PyTorch layers: If you've written custom C++ extensions for PyTorch, they will not have a standard ONNX export path.
Unsupported operators: A specific function or layer in your model might not have a corresponding operator in the ONNX version you're targeting. Our documentation provides a list of commonly problematic ops and potential workarounds.

How do I handle models with dynamic input shapes in XTorch?▼

XTorch allows you to specify a 'dynamic shape profile' during conversion. You'll define a minimum, optimal, and maximum dimension for each dynamic axis of your input tensor (e.g., batch size, sequence length). XTorch will then pass this optimization profile to TensorRT, which will create an engine that can accept inputs within that defined range.

My INT8 quantized model has poor accuracy. How can I fix it?▼

Accuracy degradation after INT8 quantization usually points to a problem with the calibration dataset.

Dataset not representative: The calibration data must accurately reflect the distribution of data your model will see in production.
Insufficient data: Using too few calibration samples (e.g., less than 500) can lead to poor quantization scaling factors.
Per-layer sensitivity: XTorch has an 'expert mode' that allows you to fall back to FP16 precision for specific, problematic layers, giving you a balance of performance and accuracy.

Can I use custom plugins or layers with XTorch?▼

Yes. If your PyTorch model uses a custom layer that has a corresponding TensorRT plugin, XTorch provides a command-line flag (--plugins) where you can pass the path to your compiled plugin library (.so file). XTorch will load this library during the TensorRT build phase.

What does the 'strict' mode in XTorch do?▼

By default, XTorch operates in a flexible mode. When you enable --strict mode, it forces the use of the latest stable ONNX opset and enables more aggressive TensorRT optimizations. This can lead to better performance but may fail if your model uses deprecated operators.

XInfer - Client-Side Inference SDK

What is XInfer?▼

XInfer is our official client library (SDK) for interacting with models deployed on Ignition-Hub. It simplifies the process of making API requests by handling authentication, data serialization, and response parsing for you, so you can focus on your application logic.

What languages does XInfer support?▼

Currently, we have official SDKs for Python and C++. We also provide clear REST API documentation for developers who wish to make requests from other languages like JavaScript, Go, or Rust.

Can I use the API without XInfer?▼

Yes. Every model deployed on Ignition-Hub has a standard REST API endpoint. You can use any HTTP client, like curl or Python's requests library, to call it. XInfer is simply a convenience wrapper.

How does XInfer handle pre-processing and post-processing?▼

XInfer is a pure inference client. It does not perform pre-processing (like image resizing or normalization) or post-processing (like non-maximum suppression). This logic should remain in your application code for maximum flexibility. You prepare your input tensor, pass it to the XInfer client, and receive the raw output tensor(s) back.

I'm seeing high latency. How can I debug it?▼

The XInfer client provides timing metadata in its response. The response.timing_info attribute contains a breakdown of time spent on: network_latency_ms, queue_wait_ms, and inference_ms. This helps you identify whether the bottleneck is your network, platform load, or the model itself.

I'm getting a 401 Unauthorized error. What's wrong?▼

This error means your API key is either incorrect, disabled, or not being sent correctly. Please check that you are using the correct and complete API key, that the key is 'active' in your settings, and that it is correctly placed in the Authorization: Bearer YOUR_API_KEY header.

How do I handle batching with XInfer?▼

The recommended way is to batch your inputs on the client side. If your TensorRT engine was built with a dynamic batch axis (e.g., shape [-1, 3, 224, 224]), you can stack your inputs into a single tensor and send it as one request. This is far more efficient than sending multiple single-item requests.

Ignition-Hub - Cloud Platform

What is a 'cold start'?▼

A cold start refers to the first request made to a model that has been idle for a period. The platform needs a few seconds to load your model onto a GPU before it can run inference. Subsequent, 'warm' requests are nearly instantaneous. Our Pro and Enterprise plans offer features to minimize or eliminate cold starts.

How does auto-scaling work?▼

Ignition-Hub uses a serverless architecture. We maintain a pool of warm GPU workers. When request traffic increases beyond the capacity of the current workers, we automatically and instantly provision new ones to handle the load. This ensures your application can handle sudden traffic spikes without you managing any servers.

How do I get my API key?▼

You can create and manage API keys from the 'API Keys' section of your Ignition-Hub dashboard. When you create a key, we show you the full secret key only once. You must copy and store it in a secure location, as we can never show it to you again.

What is the maximum size for an engine file I can upload?▼

The maximum file size depends on your subscription plan. Free Plan: Up to 250 MB. Pro Plan: Up to 2 GB. Enterprise Plan: Custom limits are available.

How can I track my API usage?▼

Your Ignition-Hub dashboard provides real-time analytics, showing your total number of requests, average inference time, and total GPU usage for the current billing cycle. You can view usage per model and per API key.

Are my uploaded models secure?▼

Absolutely. Your privacy and intellectual property are our top priority. All uploaded models are encrypted at rest and in transit. They are stored in a private, isolated environment, and can only be accessed by API calls using your securely-hashed API keys.

Can I deploy different versions of the same model?▼

Yes. We fully support model versioning. When you upload a new engine file with the same model_name as an existing one, it is tagged as a new version (e.g., my-resnet:v2). Your API endpoint can be configured to point to a specific version or to a 'latest' tag that always serves the most recently uploaded version.

Do you provide an uptime guarantee or SLA?▼

Our Pro plan offers a 99.9% uptime guarantee, and our Enterprise plan includes a formal Service Level Agreement (SLA) with options for dedicated support channels, faster response times, and service credits. Our platform status can always be monitored at status.aryorithm.com.

Our Client

General Questions

XTorch - Model Conversion & Optimization

XInfer - Client-Side Inference SDK

Ignition-Hub - Cloud Platform

15,000+ Professionals & Teams Choose Doodle

Socials

Menu

Menu

Contact Info

FAQ's

General Questions

XTorch - Model Conversion & Optimization

XInfer - Client-Side Inference SDK

Ignition-Hub - Cloud Platform

15,000+ Professionals & Teams Choose Doodle