Skip to main contentAgnitra Platform
Agnitra gives your team an opinionated workflow for profiling, optimizing, and shipping large language models with built-in telemetry and billing. The SDK and CLI handle model compilation, surface runtime hotspots, and emit marketplace-ready usage events so you can move from prototype to production without standing up separate infrastructure.
- Optimization agents adapt TorchScript/ONNX graphs with LLM + RL tuned kernels so you keep model accuracy while boosting throughput.
- Telemetry-first workflows capture latency, GPU hours, and savings in structured JSON so finance and infra teams share the same source of truth.
- Usage-based monetization maps optimization runs to Stripe, AWS Marketplace, and internal ledgers with auditable metadata and license enforcement.
Start Building Fast
- Install the SDK from PyPI and ship your first optimized artifact via the Quickstart.
- Explore repeatable automations and code samples in the SDK & CLI Guide.
- Understand what telemetry and usage events look like in practice with the Telemetry Playbook.
Architecture Overview
The MVP couples a telemetry collector, FX graph extractor, AI optimizer, kernel generator, and runtime patcher behind a unified CLI and control plane. Read the Architecture Deep Dive for module responsibilities, data flows, and performance targets drawn from the PRD.
Key Workflows
Reference Materials
What’s Next?
Run pip install agnitra, export AGNITRA_API_KEY, and launch agnitra optimize against the TinyLlama fixture to validate your workstation. From there, follow the linked guides to embed Agnitra into pipelines, dashboards, and billing loops.