Telemetry Playbook
Agnitra treats telemetry as a first-class artifact. Every optimization captures before/after metrics so engineering, infra, and finance teams agree on the impact of a rollout. This guide explains how telemetry is produced and how to route it to your observability stack.What the CLI & SDK Emit
| Artifact | File | Contents | Primary Consumers |
|---|---|---|---|
| Telemetry snapshot | telemetry.json (configurable via --telemetry-out) | Latency, throughput, GPU utilization, kernel-level hotspots, PPO scores. | Performance engineers, dashboards. |
| Usage event | Printed to stdout and returned from SDK calls (result.usage_event) | GPU hours saved, cost deltas, currency, marketplace payloads, project metadata. | Billing, finance, marketplace exporters. |
| Optimization artifact | dist/<model>_optimized.pt | TorchScript/ONNX artifact with patched kernels and metadata. | Serving teams, registries. |
Routing Telemetry
- File drops —
agnitra optimize --telemetry-out telemetry.jsonwrites a structured JSON file. Persist it to S3, GCS, or your artifact store. - Programmatic export — Use
agnitra.telemetry_collectorandagnitra.telemetry.usage_meterhelpers to push directly to HTTP, Kafka, or Snowflake. - Marketplace dispatchers — Extras like
agnitra[marketplace]register AWS, GCP, and Stripe exporters (StripeUsageDispatcher,AwsMarketplaceDispatcher) that run asynchronously after each optimization.
project_id, model_name, and timestamps so you can join them in downstream jobs.
Dashboards & Alerting
agnitra-dashboardrenders telemetry bundles locally, highlighting speedups, GPU hour savings, and license compliance.- Push aggregated snapshots into your metrics system (Prometheus, Datadog, Grafana) to track optimization coverage and ROI over time.
- Alert when
expected_speedup_pctdrops below target or whenusage_event.status != "delivered"to catch marketplace backoffs.
Best Practices
- Store raw telemetry before aggregating so you can retroactively re-price or inspect kernels.
- Sign usage events before dispatching to marketplaces to meet compliance requirements.
- Attach
job_metadata(CLI flag) ormetadata(SDK argument) to correlate runs with CI pipelines, pull requests, or customer tenants. - Rotate
AGNITRA_API_KEYand audit outbound webhook targets to avoid leaking telemetry to untrusted endpoints.
Related Reading
- SDK & CLI Guide — command references and return types.
- Marketplace & Billing — how telemetry powers pricing workflows.
- Runtime Configuration — environment variables that toggle telemetry exporters.