Chat completions

POST/v1/chat/completions

Sends a list of messages and returns a completion. The request and response format matches the OpenAI Chat Completions API exactly. Any client that works with OpenAI will work here without modification.

Request body

Parameter	Type	Required	Description
model	string	Yes	Model ID from /v1/models, e.g. ps/qwen3.6-35b-a3b.
messages	array	Yes	Array of {role, content} objects. Roles: system, user, assistant.
stream	boolean	No	Stream tokens via SSE. Default false.
temperature	number	No	Sampling temperature, 0–2. Higher = more random. Default 1.
max_tokens	integer	No	Maximum number of completion tokens to generate.
top_p	number	No	Nucleus sampling threshold. Default 1 (disabled).
stop	string \| array	No	One or more sequences where generation stops.
frequency_penalty	number	No	-2 to 2. Penalises tokens proportional to how often they appear. Default 0.
presence_penalty	number	No	-2 to 2. Penalises tokens that have appeared at all. Default 0.

Example

curl -X POST https://api.pinstripes.io/v1/chat/completions \
  -H "Authorization: Bearer sk-ps-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ps/qwen3.6-35b-a3b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is prefix caching?"}
    ],
    "temperature": 0.7
  }'

Response (non-streaming)

{
  "id": "chatcmpl-a1b2c3d4",
  "object": "chat.completion",
  "created": 1748995200,
  "model": "ps/qwen3.6-35b-a3b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Prefix caching reuses KV cache entries for repeated leading tokens..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 84,
    "cached_tokens": 0,
    "total_tokens": 116
  }
}

Response (streaming)

With stream: true the response is a series of server-sent events. Each event is a JSON object prefixed with data: . The stream ends with data: [DONE].

data: {"id":"chatcmpl-a1b2c3d4","object":"chat.completion.chunk","created":1748995200,"model":"ps/qwen3.6-35b-a3b","choices":[{"index":0,"delta":{"role":"assistant","content":"Prefix"},"finish_reason":null}]}

data: {"id":"chatcmpl-a1b2c3d4","object":"chat.completion.chunk","created":1748995200,"model":"ps/qwen3.6-35b-a3b","choices":[{"index":0,"delta":{"content":" caching"},"finish_reason":null}]}

data: {"id":"chatcmpl-a1b2c3d4","object":"chat.completion.chunk","created":1748995200,"model":"ps/qwen3.6-35b-a3b","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Prefix caching

Consecutive requests that share the same leading tokens — a system prompt, a few-shot block, a long document — reuse the KV cache from the first request. Cached tokens are billed at $0.05/1M regardless of model, compared to $0.07–$0.51/1M for regular input tokens. The cached_tokens field in the usage object shows how many tokens were served from cache on each request.