Architecture Overview

Agnitra combines profiling, AI-assisted kernel tuning, and usage-based billing so you can prove performance gains before pushing new models to production. This overview explains how the pieces fit together from a user’s point of view.

Optimization Loop at a Glance

Profile & capture telemetry – The CLI or SDK traces your TorchScript/ONNX model, capturing latency, throughput, memory, and GPU utilization.
Plan optimizations – Telemetry is fed to the Agnitra optimizer, which pairs OpenAI Responses API hints with reinforcement learning to propose kernel tweaks.
Patch & validate – Agnitra generates Triton/CUDA kernels, validates correctness against your baseline, and swaps them into the runtime.
Report uplift – Before/after telemetry becomes a structured usage event you can forward to finance systems, marketplaces, or dashboards.

Every stage emits artifacts (telemetry JSON, optimized models, usage events) so you can audit changes or roll back quickly.

Core Surfaces You’ll Use

CLI & Python SDK – Run agnitra optimize locally or embed agnitra.optimize() inside services to automate tuning.
Agentic Optimization API (agnitra-api) – Offload work to a remote control plane, queue jobs, and replay usage events via REST.
Telemetry exporters – Push JSON snapshots to S3, HTTP endpoints, Kafka, or your observability stack for long-term analysis.
Marketplace dispatchers – Enable adapters (Stripe, AWS Marketplace, GCP Marketplace) to translate usage events into billable records.

Deployment Options

Managed – Point the CLI/SDK at the hosted control plane (https://api.agnitra.ai) and use preconfigured telemetry dashboards.
Self-hosted – Run agnitra-api in your environment, connect it to GPU workers, and plug telemetry into your data stores.
Offline / regulated – Enable offline mode with enterprise licenses to generate optimizations without external network access; usage events sync once connectivity returns.

Performance Goals

Metric	Target
Tokens per second uplift	≥ 20%
Latency reduction	≥ 15%
Memory efficiency	≥ 25%
Integration time	< 10 minutes
Output correctness	≥ 99.9% parity with baseline

Continue Exploring

Follow the Quickstart to install the SDK and run your first optimization.
Deep dive into automation patterns in the SDK & CLI Guide.
Wire telemetry to billing systems using the Marketplace & Billing guide.

Start Here

Product Guides

Architecture Overview

Architecture Overview

Optimization Loop at a Glance

Core Surfaces You’ll Use

Deployment Options

Performance Goals

Continue Exploring

Start Here

Product Guides

​Architecture Overview

​Optimization Loop at a Glance

​Core Surfaces You’ll Use

​Deployment Options

​Performance Goals

​Continue Exploring

Architecture Overview

Optimization Loop at a Glance

Core Surfaces You’ll Use

Deployment Options

Performance Goals

Continue Exploring