Skip to main content

Control Plane Reference

agnitra-api exposes a Starlette/FastAPI service that backs the CLI and SDK when you run optimizations remotely. This reference summarises the main endpoints, request formats, and operational notes.

Authentication

  • The API expects Authorization: Bearer <AGNITRA_API_KEY> headers on every request.
  • When running self-hosted, set AGNITRA_API_KEY and AGNITRA_API_BASE_URL (or AGNITRA_CONTROL_PLANE_URL) in your environment before starting the server.
  • Optional headers:
    • Agnitra-Project — override the project associated with the API key.
    • Agnitra-Operator — identify the human or service triggering the optimization.

Endpoints

POST /optimize

Submit a model graph and telemetry bundle for optimization.
curl -X POST "$AGNITRA_API_BASE_URL/optimize" \
  -H "Authorization: Bearer $AGNITRA_API_KEY" \
  -F model_graph=@graph_ir.json \
  -F [email protected] \
  -F target=A100 \
  -F project_id=demo
Payload fields
  • model_graph — TorchScript FX graph JSON generated by the SDK/CLI.
  • telemetry — Profiler snapshot generated by agnitra optimize --telemetry-out.
  • target — Hardware target (e.g. A100, H100, L40S).
  • project_id (optional) — Overrides the project derived from the API key.
  • options (optional JSON) — Extra flags such as {"enable_rl": true}.
Response
{
  "job_id": "job_123",
  "status": "queued",
  "expected_speedup_pct": 28.5,
  "outputs": {
    "optimized_artifact_url": "s3://...",
    "patch_instructions": {...}
  }
}
Depending on configuration, the API may perform optimizations synchronously (returning the final artifact) or queue the job for asynchronous processing.

GET /jobs/{job_id}

Poll queued optimizations.
curl -H "Authorization: Bearer $AGNITRA_API_KEY" \
  "$AGNITRA_API_BASE_URL/jobs/job_123"
Response fields include status (queued, running, completed, failed), progress percentages, and final artifact locations. Failed jobs return a structured error payload with remediation hints.

POST /usage

Record billable usage explicitly or replay stored telemetry.
curl -X POST "$AGNITRA_API_BASE_URL/usage" \
  -H "Authorization: Bearer $AGNITRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "project_id": "demo",
        "model_name": "tinyllama",
        "baseline": {"latency_ms": 120, "tokens_per_sec": 90},
        "optimized": {"latency_ms": 80, "tokens_per_sec": 140},
        "providers": ["stripe", "aws"]
      }'
The service responds with the computed UsageEvent, dispatch results, and any deferred providers. Use this endpoint to replay events when marketplaces were unavailable during the original optimization run.

Status Codes & Error Handling

  • 401 Unauthorized — missing or invalid API key.
  • 403 Forbidden — project mismatch or insufficient privileges for the requested operation.
  • 422 Unprocessable Entity — invalid telemetry or graph payload. Inspect the error body for validation failures.
  • 429 Too Many Requests — control plane throttling. Back off and retry with jitter.
  • 5xx — unexpected error. Include the Agnitra-Request-Id header in support tickets.

Running Locally

pip install -e .[openai,rl,nvml,marketplace]
export AGNITRA_API_KEY="dev-key"
agnitra-api --host 127.0.0.1 --port 8080
  • Combine with uvicorn options (--reload, --workers) for iterative development.
  • Set AGNITRA_OFFLINE_MODE=1 to bypass upstream calls while debugging.

Observability

  • The service emits structured logs with request IDs and latency buckets.
  • Prometheus metrics are exposed at /metrics when AGNITRA_ENABLE_METRICS=1.
  • Use distributed tracing by setting AGNITRA_TRACING_EXPORTER=otlp and providing OTEL_EXPORTER_OTLP_ENDPOINT.