Control Plane Reference

agnitra-api exposes a Starlette/FastAPI service that backs the CLI and SDK when you run optimizations remotely. This reference summarises the main endpoints, request formats, and operational notes.

Authentication

The API expects Authorization: Bearer <AGNITRA_API_KEY> headers on every request.
When running self-hosted, set AGNITRA_API_KEY and AGNITRA_API_BASE_URL (or AGNITRA_CONTROL_PLANE_URL) in your environment before starting the server.
Optional headers:
- Agnitra-Project — override the project associated with the API key.
- Agnitra-Operator — identify the human or service triggering the optimization.

Endpoints

`POST /optimize`

Submit a model graph and telemetry bundle for optimization.

curl -X POST "$AGNITRA_API_BASE_URL/optimize" \
  -H "Authorization: Bearer $AGNITRA_API_KEY" \
  -F model_graph=@graph_ir.json \
  -F [email protected] \
  -F target=A100 \
  -F project_id=demo

Payload fields

model_graph — TorchScript FX graph JSON generated by the SDK/CLI.
telemetry — Profiler snapshot generated by agnitra optimize --telemetry-out.
target — Hardware target (e.g. A100, H100, L40S).
project_id (optional) — Overrides the project derived from the API key.
options (optional JSON) — Extra flags such as {"enable_rl": true}.

Response

{
  "job_id": "job_123",
  "status": "queued",
  "expected_speedup_pct": 28.5,
  "outputs": {
    "optimized_artifact_url": "s3://...",
    "patch_instructions": {...}
  }
}

Depending on configuration, the API may perform optimizations synchronously (returning the final artifact) or queue the job for asynchronous processing.

`GET /jobs/{job_id}`

Poll queued optimizations.

curl -H "Authorization: Bearer $AGNITRA_API_KEY" \
  "$AGNITRA_API_BASE_URL/jobs/job_123"

Response fields include status (queued, running, completed, failed), progress percentages, and final artifact locations. Failed jobs return a structured error payload with remediation hints.

`POST /usage`

Record billable usage explicitly or replay stored telemetry.

curl -X POST "$AGNITRA_API_BASE_URL/usage" \
  -H "Authorization: Bearer $AGNITRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "project_id": "demo",
        "model_name": "tinyllama",
        "baseline": {"latency_ms": 120, "tokens_per_sec": 90},
        "optimized": {"latency_ms": 80, "tokens_per_sec": 140},
        "providers": ["stripe", "aws"]
      }'

The service responds with the computed UsageEvent, dispatch results, and any deferred providers. Use this endpoint to replay events when marketplaces were unavailable during the original optimization run.

Status Codes & Error Handling

401 Unauthorized — missing or invalid API key.
403 Forbidden — project mismatch or insufficient privileges for the requested operation.
422 Unprocessable Entity — invalid telemetry or graph payload. Inspect the error body for validation failures.
429 Too Many Requests — control plane throttling. Back off and retry with jitter.
5xx — unexpected error. Include the Agnitra-Request-Id header in support tickets.

Running Locally

pip install -e .[openai,rl,nvml,marketplace]
export AGNITRA_API_KEY="dev-key"
agnitra-api --host 127.0.0.1 --port 8080

Combine with uvicorn options (--reload, --workers) for iterative development.
Set AGNITRA_OFFLINE_MODE=1 to bypass upstream calls while debugging.

Observability

The service emits structured logs with request IDs and latency buckets.
Prometheus metrics are exposed at /metrics when AGNITRA_ENABLE_METRICS=1.
Use distributed tracing by setting AGNITRA_TRACING_EXPORTER=otlp and providing OTEL_EXPORTER_OTLP_ENDPOINT.

SDK & CLI Guide — local entry points that call these endpoints when --offline is disabled.
Marketplace & Billing — usage event schema powering the /usage endpoint.
Responses API contract — how AI optimization hints are generated before kernels are patched.

Platform

​Control Plane Reference

​Authentication

​Endpoints

​POST /optimize

​GET /jobs/{job_id}

​POST /usage

​Status Codes & Error Handling

​Running Locally

​Observability

​Related Documentation