Architecture Overview
Agnitra combines profiling, AI-assisted kernel tuning, and usage-based billing so you can prove performance gains before pushing new models to production. This overview explains how the pieces fit together from a user’s point of view.Optimization Loop at a Glance
- Profile & capture telemetry – The CLI or SDK traces your TorchScript/ONNX model, capturing latency, throughput, memory, and GPU utilization.
- Plan optimizations – Telemetry is fed to the Agnitra optimizer, which pairs OpenAI Responses API hints with reinforcement learning to propose kernel tweaks.
- Patch & validate – Agnitra generates Triton/CUDA kernels, validates correctness against your baseline, and swaps them into the runtime.
- Report uplift – Before/after telemetry becomes a structured usage event you can forward to finance systems, marketplaces, or dashboards.
Core Surfaces You’ll Use
- CLI & Python SDK – Run
agnitra optimizelocally or embedagnitra.optimize()inside services to automate tuning. - Agentic Optimization API (
agnitra-api) – Offload work to a remote control plane, queue jobs, and replay usage events via REST. - Telemetry exporters – Push JSON snapshots to S3, HTTP endpoints, Kafka, or your observability stack for long-term analysis.
- Marketplace dispatchers – Enable adapters (Stripe, AWS Marketplace, GCP Marketplace) to translate usage events into billable records.
Deployment Options
- Managed – Point the CLI/SDK at the hosted control plane (
https://api.agnitra.ai) and use preconfigured telemetry dashboards. - Self-hosted – Run
agnitra-apiin your environment, connect it to GPU workers, and plug telemetry into your data stores. - Offline / regulated – Enable offline mode with enterprise licenses to generate optimizations without external network access; usage events sync once connectivity returns.
Performance Goals
| Metric | Target |
|---|---|
| Tokens per second uplift | ≥ 20% |
| Latency reduction | ≥ 15% |
| Memory efficiency | ≥ 25% |
| Integration time | < 10 minutes |
| Output correctness | ≥ 99.9% parity with baseline |
Continue Exploring
- Follow the Quickstart to install the SDK and run your first optimization.
- Deep dive into automation patterns in the SDK & CLI Guide.
- Wire telemetry to billing systems using the Marketplace & Billing guide.