Responses API Contract

Agnitra uses the OpenAI Responses API to translate profiler telemetry into kernel tuning recommendations. This reference covers request schemas, tooling payloads, and operational guardrails.

Base URL

https://api.openai.com/v1/responses

All calls must be made over HTTPS.

Authentication

Header	Value
`Authorization`	`Bearer <OPENAI_API_KEY>`
`OpenAI-Project`	Optional. Overrides the project context associated with the API key.
`OpenAI-Organization`	Optional. Use when the key belongs to multiple organisations.

Store API keys in a secrets manager or environment variable. Never expose them in client-side code.

Request Schema

{
  "model": "gpt-5-codex",
  "input": [
    { "role": "system", "content": "Provide kernel tuning recommendations for the supplied telemetry." },
    { "role": "user", "content": "Telemetry summary: {...}" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "record_patch_plan",
        "description": "Persist selected kernel parameters.",
        "parameters": {
          "type": "object",
          "properties": {
            "block_size": { "type": "integer" },
            "tile_shape": {
              "type": "array",
              "items": { "type": "integer" },
              "minItems": 2,
              "maxItems": 2
            },
            "expected_latency_ms": { "type": "number" }
          },
          "required": ["block_size", "tile_shape", "expected_latency_ms"]
        }
      }
    }
  ],
  "metadata": {
    "project_id": "demo",
    "model_name": "tinyllama",
    "target": "A100"
  }
}

model must reference a Responses-capable deployment (e.g. gpt-5-codex or gpt-5-mini). Do not use deprecated parameters such as temperature or max_output_tokens.
input accepts text or image content. Agnitra sends JSON-formatted telemetry snippets.
tools unlock structured responses via function calling; strict schemas avoid invalid payloads.
metadata captures attribution for usage metering and billing.

Response Structure

{
  "id": "resp_123",
  "usage": { "total_tokens": 512 },
  "output": [
    {
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "Suggested block size: 128 ..." }
      ]
    }
  ],
  "tool_calls": [
    {
      "type": "function",
      "function": {
        "name": "record_patch_plan",
        "arguments": "{\"block_size\":128,\"tile_shape\":[64,64]}"
      }
    }
  ]
}

Parse tool_calls and validate against the schema before mutating kernels.
usage.total_tokens feeds into Agnitra’s cost telemetry pipeline.

Rate Limits & Diagnostics

Inspect x-ratelimit-* headers to understand remaining token/request budgets.
Log the x-request-id header for each call to accelerate support escalation.
Retry with exponential backoff on 429 and 5xx responses; avoid retrying validation errors.

Error Handling

Status	Typical Cause	Resolution
`400 Bad Request`	Invalid JSON schema or tool payload.	Inspect error body, correct request data.
`401 Unauthorized`	API key missing/invalid.	Refresh credentials and retry.
`403 Forbidden`	Project/org mismatch.	Confirm `OpenAI-Project`/`OpenAI-Organization` headers.
`429 Too Many Requests`	Rate limit exceeded.	Use `Retry-After` header and exponential backoff.
`500`/`503`	Upstream transient error.	Retry with jitter (max 3 attempts).

Security Checklist

Rotate API keys regularly and scope them to the minimum required access.
Sanitize telemetry content to avoid leaking customer identifiers.
Validate every tool-call payload against the defined JSON schema prior to execution.

Runtime Configuration — environment variables that toggle Responses API usage.
SDK & CLI Guide — how responses inform local optimisation flows.
Architecture Overview — where Responses API fits in the optimization loop.

Platform

​Responses API Contract

​Base URL

​Authentication

​Request Schema

​Response Structure

​Rate Limits & Diagnostics

​Error Handling

​Security Checklist

​Related Documentation