SillyTavern + pinstripes: cloud inference for your roleplay setup

What you'll need

- [SillyTavern](https://github.com/SillyTavern/SillyTavern) (any recent version)

- A pinstripes API key — get one at [pinstripes.io/signup](/signup)

Step 1 — Open API settings

In SillyTavern, click the **API** button (plug icon, top bar). In the **Chat Completion Source** dropdown, select **OpenAI**.

Step 2 — Point to pinstripes

Expand **OpenAI / Compatible** settings. Fill in:

Field	Value
Custom Endpoint (Base URL)	`https://api.pinstripes.io/v1`
API Key	`sk-ps-...`

Enable **Use External API endpoint** if the checkbox is present.

Step 3 — Pick a model

In the **Model** dropdown, type or select:

- ps/qwen3.6-a3b — fast, strong reasoning

- ps/deepseek-v4-flash — best for speed in long RP sessions

- ps/gpt-oss-120b — highest quality for detailed prose

Step 4 — Test the connection

Click **Test Message** (or start a new chat). You should see a response stream in immediately.

Recommended settings for roleplay

All OpenAI chat parameters SillyTavern sends work as expected.

For long-form RP, enable **Prefix Caching** in your SillyTavern context settings. pinstripes automatically caches repeated prompt prefixes (persona cards, world info) at **$0.05/1M tokens** instead of the full input rate. A 4,000-token system prompt used across 100 messages saves around 98% of those prompt costs.

Troubleshooting

**Model not found**: Make sure the model ID includes the ps/ prefix exactly as shown above.

**Streaming stops mid-response**: This is usually a SillyTavern timeout. In SillyTavern settings, increase **Response Streaming Timeout** to 120 seconds.

**402 Payment Required**: Your balance is empty. Top up at [pinstripes.io/dashboard](/dashboard).