Skip to main content

Responses API Contract

Agnitra uses the OpenAI Responses API to translate profiler telemetry into kernel tuning recommendations. This reference covers request schemas, tooling payloads, and operational guardrails.

Base URL

https://api.openai.com/v1/responses
All calls must be made over HTTPS.

Authentication

HeaderValue
AuthorizationBearer <OPENAI_API_KEY>
OpenAI-ProjectOptional. Overrides the project context associated with the API key.
OpenAI-OrganizationOptional. Use when the key belongs to multiple organisations.
Store API keys in a secrets manager or environment variable. Never expose them in client-side code.

Request Schema

{
  "model": "gpt-5-codex",
  "input": [
    { "role": "system", "content": "Provide kernel tuning recommendations for the supplied telemetry." },
    { "role": "user", "content": "Telemetry summary: {...}" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "record_patch_plan",
        "description": "Persist selected kernel parameters.",
        "parameters": {
          "type": "object",
          "properties": {
            "block_size": { "type": "integer" },
            "tile_shape": {
              "type": "array",
              "items": { "type": "integer" },
              "minItems": 2,
              "maxItems": 2
            },
            "expected_latency_ms": { "type": "number" }
          },
          "required": ["block_size", "tile_shape", "expected_latency_ms"]
        }
      }
    }
  ],
  "metadata": {
    "project_id": "demo",
    "model_name": "tinyllama",
    "target": "A100"
  }
}
  • model must reference a Responses-capable deployment (e.g. gpt-5-codex or gpt-5-mini). Do not use deprecated parameters such as temperature or max_output_tokens.
  • input accepts text or image content. Agnitra sends JSON-formatted telemetry snippets.
  • tools unlock structured responses via function calling; strict schemas avoid invalid payloads.
  • metadata captures attribution for usage metering and billing.

Response Structure

{
  "id": "resp_123",
  "usage": { "total_tokens": 512 },
  "output": [
    {
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "Suggested block size: 128 ..." }
      ]
    }
  ],
  "tool_calls": [
    {
      "type": "function",
      "function": {
        "name": "record_patch_plan",
        "arguments": "{\"block_size\":128,\"tile_shape\":[64,64]}"
      }
    }
  ]
}
  • Parse tool_calls and validate against the schema before mutating kernels.
  • usage.total_tokens feeds into Agnitra’s cost telemetry pipeline.

Rate Limits & Diagnostics

  • Inspect x-ratelimit-* headers to understand remaining token/request budgets.
  • Log the x-request-id header for each call to accelerate support escalation.
  • Retry with exponential backoff on 429 and 5xx responses; avoid retrying validation errors.

Error Handling

StatusTypical CauseResolution
400 Bad RequestInvalid JSON schema or tool payload.Inspect error body, correct request data.
401 UnauthorizedAPI key missing/invalid.Refresh credentials and retry.
403 ForbiddenProject/org mismatch.Confirm OpenAI-Project/OpenAI-Organization headers.
429 Too Many RequestsRate limit exceeded.Use Retry-After header and exponential backoff.
500/503Upstream transient error.Retry with jitter (max 3 attempts).

Security Checklist

  • Rotate API keys regularly and scope them to the minimum required access.
  • Sanitize telemetry content to avoid leaking customer identifiers.
  • Validate every tool-call payload against the defined JSON schema prior to execution.