Made-to-measure inference

Inference cut to measure,
priced off the rack

We run open-weight models on hardware tuned for throughput. Latency is predictable, efficiency is built into the stack so you get the same output quality at lower cost, and nothing is retained after the response ships.

Get API key Read the docs

Cut for throughput

Consumer hardware running our inference stack matches enterprise cluster throughput, with predictable TTFT. Capacity scales in micro-increments, so you add exactly what fits without stepping up to the next reservation tier.

Slips over your existing client

Change base_url in your OpenAI SDK client, a slip stitch that sits invisible from the outside while everything else in your codebase stays the same: streaming and function calling work identically, and structured output does too.

Nothing leaves the fitting room

Requests leave our infrastructure when the response ships. We don't log them or train on them.

Integrate

One alteration, and nothing else needs touching.

Any library that accepts a custom base URL routes through pinstripes. The integration guides walk through the common clients.

your_app.py

from openai import OpenAI

client = OpenAI(

api_key = "sk-ps-...",

base_url = "https://api.pinstripes.io/v1", # ← this line

)

# everything else unchanged

response = client.chat.completions.create(

model = "ps/qwen3-6-35b-a3b",

messages = [{"role": "user", "content": prompt}],

stream = True,

)

Python SDKNode.js SDKcurlLangChainLlamaIndexVercel AI SDK

Models & Pricing

Same models, sharper price.

We optimise the stack for token efficiency, so the same request costs less here than it does at the name-brand providers — without any difference in output quality.

DeepSeek V4 Flash

Input$0.087 / 1M

Output$0.173 / 1M

Cached$0.05 / 1M

ps/deepseek-v4-flash

DeepSeek V4 Pro

Input$0.39 / 1M

Output$0.78 / 1M

Cached$0.05 / 1M

ps/deepseek-v4-pro

Qwen3.5 35B A3B

Input$0.12 / 1M

Output$0.85 / 1M

Cached$0.05 / 1M

ps/qwen3.5-35b-a3b

Qwen3.6 35B A3B

Input$0.12 / 1M

Output$0.85 / 1M

Cached$0.05 / 1M

ps/qwen3.6-35b-a3b

GLM 4.5 Air

Input$0.11 / 1M

Output$0.75 / 1M

Cached$0.05 / 1M

ps/glm-4.5-air

GPT OSS 120B

Input$0.033 / 1M

Output$0.153 / 1M

Cached$0.05 / 1M

ps/gpt-oss-120b

Llama 4 Maverick

Input$0.13 / 1M

Output$0.53 / 1M

Cached$0.05 / 1M

ps/llama-4-maverick

Llama 4 Scout

Input$0.071 / 1M

Output$0.265 / 1M

Cached$0.05 / 1M

ps/llama-4-scout

Kimi K2

Input$0.51 / 1M

Output$2.07 / 1M

Cached$0.05 / 1M

ps/kimi-k2

Step 3.7 Flash

Input$0.175 / 1M

Output$1.01 / 1M

Cached$0.05 / 1M

ps/step-3.7-flash