4 min read
- [SillyTavern](https://github.com/SillyTavern/SillyTavern) (any recent version)
- A pinstripes API key — get one at [pinstripes.io/signup](/signup)
In SillyTavern, click the **API** button (plug icon, top bar). In the **Chat Completion Source** dropdown, select **OpenAI**.
Expand **OpenAI / Compatible** settings. Fill in:
| Field | Value |
|---|---|
| Custom Endpoint (Base URL) | https://api.pinstripes.io/v1 |
| API Key | sk-ps-... |
Enable **Use External API endpoint** if the checkbox is present.
In the **Model** dropdown, type or select:
- ps/qwen3.6-a3b — fast, strong reasoning
- ps/deepseek-v4-flash — best for speed in long RP sessions
- ps/gpt-oss-120b — highest quality for detailed prose
Click **Test Message** (or start a new chat). You should see a response stream in immediately.
All OpenAI chat parameters SillyTavern sends work as expected.
For long-form RP, enable **Prefix Caching** in your SillyTavern context settings. pinstripes automatically caches repeated prompt prefixes (persona cards, world info) at **$0.05/1M tokens** instead of the full input rate. A 4,000-token system prompt used across 100 messages saves around 98% of those prompt costs.
**Model not found**: Make sure the model ID includes the ps/ prefix exactly as shown above.
**Streaming stops mid-response**: This is usually a SillyTavern timeout. In SillyTavern settings, increase **Response Streaming Timeout** to 120 seconds.
**402 Payment Required**: Your balance is empty. Top up at [pinstripes.io/dashboard](/dashboard).
Ready to build?