High-Performance C++ AI, Simplified

Understanding Your Bill: A Breakdown of Inference Time on Ignition-Hub

One of our core principles is transparent pricing. When you get your monthly bill from Ignition-Hub, we want you to know exactly what you're paying for. The primary metric is 'GPU Inference Time', measured in milliseconds.

The Lifecycle of an API Request

When you call your endpoint via the XInfer SDK, several things happen:

  1. Network Transit: Your data travels from your client to our API gateway.
  2. Queueing: Your request is placed in a queue to be picked up by a GPU worker. (This is usually instantaneous on Pro plans).
  3. Model Execution: The worker runs your model with your provided input. This is the 'Inference Time' that we bill for.
  4. Network Return: The results are sent back to your client.

You are not billed for network transit time or queueing time. You only pay for the exact duration your model is actively using the GPU's computational resources.

How to Reduce Your Bill

The best way to lower your bill is to reduce your model's inference time. This can be achieved by:

  • Using FP16 or INT8 quantization in XTorch to make your model run faster.
  • Implementing client-side batching to process more data in a single request, which is more efficient than many small requests.

Your dashboard provides a detailed breakdown of the average inference time for each of your models, helping you identify which ones are your biggest cost drivers.