General Questions
This is the most important question to understand our ecosystem. XTorch is your starting point for converting and optimizing your PyTorch models into TensorRT engines. Ignition-Hub is the cloud platform where you upload that engine file to deploy it as a scalable API. XInfer is our client-side library (SDK) that makes it simple to call your deployed API from your own application. Think of it as a three-step pipeline: Prepare (XTorch) -> Deploy (Ignition-Hub) -> Integrate (XInfer).
XTorch - Model Conversion & Optimization
XTorch is a command-line tool and Python library designed to streamline the conversion of PyTorch models into optimized TensorRT engines. It intelligently handles the conversion to ONNX and then to TensorRT, applying optimizations like FP16 or INT8 quantization to maximize inference speed.
No, but it is highly recommended. Ignition-Hub accepts any valid TensorRT
.engine file. If you have your own complex conversion pipeline, you can absolutely use that. XTorch is provided to make the process easier and more reliable for the 90% of use cases.XTorch is designed to work with models from PyTorch 1.8 and newer. We always recommend using the latest stable version of PyTorch for the best results, as ONNX export support improves with each release.
- FP16 (Half Precision): This optimization reduces your model's size by half and can significantly speed up inference with minimal loss in accuracy. It's a great default choice.
- INT8 (8-bit Integer): This offers the highest performance boost and smallest model size but requires a calibration step with a representative dataset. It can sometimes lead to a noticeable drop in accuracy, so it should be used carefully and validated. XTorch provides tools to help with the calibration process.
Conversion failures are often due to a model containing operations (ops) that are not supported by the ONNX exporter or TensorRT. Common culprits include:
- Dynamic shapes or control flow: Models with input sizes that change dramatically or contain complex loops can be difficult to convert.
- Custom PyTorch layers: If you've written custom C++ extensions for PyTorch, they will not have a standard ONNX export path.
- Unsupported operators: A specific function or layer in your model might not have a corresponding operator in the ONNX version you're targeting. Our documentation provides a list of commonly problematic ops and potential workarounds.
XTorch allows you to specify a 'dynamic shape profile' during conversion. You'll define a minimum, optimal, and maximum dimension for each dynamic axis of your input tensor (e.g., batch size, sequence length). XTorch will then pass this optimization profile to TensorRT, which will create an engine that can accept inputs within that defined range.
Accuracy degradation after INT8 quantization usually points to a problem with the calibration dataset.
- Dataset not representative: The calibration data must accurately reflect the distribution of data your model will see in production.
- Insufficient data: Using too few calibration samples (e.g., less than 500) can lead to poor quantization scaling factors.
- Per-layer sensitivity: XTorch has an 'expert mode' that allows you to fall back to FP16 precision for specific, problematic layers, giving you a balance of performance and accuracy.
Yes. If your PyTorch model uses a custom layer that has a corresponding TensorRT plugin, XTorch provides a command-line flag (
--plugins) where you can pass the path to your compiled plugin library (.so file). XTorch will load this library during the TensorRT build phase.By default, XTorch operates in a flexible mode. When you enable
--strict mode, it forces the use of the latest stable ONNX opset and enables more aggressive TensorRT optimizations. This can lead to better performance but may fail if your model uses deprecated operators.XInfer - Client-Side Inference SDK
XInfer is our official client library (SDK) for interacting with models deployed on Ignition-Hub. It simplifies the process of making API requests by handling authentication, data serialization, and response parsing for you, so you can focus on your application logic.
Currently, we have official SDKs for Python and C++. We also provide clear REST API documentation for developers who wish to make requests from other languages like JavaScript, Go, or Rust.
Yes. Every model deployed on Ignition-Hub has a standard REST API endpoint. You can use any HTTP client, like
curl or Python's requests library, to call it. XInfer is simply a convenience wrapper.XInfer is a pure inference client. It does not perform pre-processing (like image resizing or normalization) or post-processing (like non-maximum suppression). This logic should remain in your application code for maximum flexibility. You prepare your input tensor, pass it to the XInfer client, and receive the raw output tensor(s) back.
The XInfer client provides timing metadata in its response. The
response.timing_info attribute contains a breakdown of time spent on: network_latency_ms, queue_wait_ms, and inference_ms. This helps you identify whether the bottleneck is your network, platform load, or the model itself.This error means your API key is either incorrect, disabled, or not being sent correctly. Please check that you are using the correct and complete API key, that the key is 'active' in your settings, and that it is correctly placed in the
Authorization: Bearer YOUR_API_KEY header.The recommended way is to batch your inputs on the client side. If your TensorRT engine was built with a dynamic batch axis (e.g., shape
[-1, 3, 224, 224]), you can stack your inputs into a single tensor and send it as one request. This is far more efficient than sending multiple single-item requests.Ignition-Hub - Cloud Platform
A cold start refers to the first request made to a model that has been idle for a period. The platform needs a few seconds to load your model onto a GPU before it can run inference. Subsequent, 'warm' requests are nearly instantaneous. Our Pro and Enterprise plans offer features to minimize or eliminate cold starts.
Ignition-Hub uses a serverless architecture. We maintain a pool of warm GPU workers. When request traffic increases beyond the capacity of the current workers, we automatically and instantly provision new ones to handle the load. This ensures your application can handle sudden traffic spikes without you managing any servers.
You can create and manage API keys from the 'API Keys' section of your Ignition-Hub dashboard. When you create a key, we show you the full secret key only once. You must copy and store it in a secure location, as we can never show it to you again.
The maximum file size depends on your subscription plan. Free Plan: Up to 250 MB. Pro Plan: Up to 2 GB. Enterprise Plan: Custom limits are available.
Your Ignition-Hub dashboard provides real-time analytics, showing your total number of requests, average inference time, and total GPU usage for the current billing cycle. You can view usage per model and per API key.
Absolutely. Your privacy and intellectual property are our top priority. All uploaded models are encrypted at rest and in transit. They are stored in a private, isolated environment, and can only be accessed by API calls using your securely-hashed API keys.
Yes. We fully support model versioning. When you upload a new engine file with the same
model_name as an existing one, it is tagged as a new version (e.g., my-resnet:v2). Your API endpoint can be configured to point to a specific version or to a 'latest' tag that always serves the most recently uploaded version.Our Pro plan offers a 99.9% uptime guarantee, and our Enterprise plan includes a formal Service Level Agreement (SLA) with options for dedicated support channels, faster response times, and service credits. Our platform status can always be monitored at
status.aryorithm.com.Our Client
15,000+ Professionals & Teams Choose Doodle





.png)




