Data Policy

Last updated: June 8, 2026

Pinstripes is an inference provider operated by Redacted Labs, Corp., a Delaware corporation. This page explains, in plain terms, how we handle the data in requests you send to our API — including traffic routed to us through partners such as OpenRouter. For the full legal detail, see our Privacy Policy and Terms of Use.

Training

We do not use your prompts or completions — or any data derived from them — to train, fine-tune, or improve any model, our own or anyone else’s. Your content is never added to a training set.

Prompt and completion logging

We do not log or retain the content of your prompts or completions in any durable log, and we never share that content. The only place request-derived data persists is the transient inference cache described below.

Inference cache

To reduce latency, our worker nodes maintain a tiered prefix/key-value cache across GPU memory, system memory, and local NVMe storage. This cache holds derived key/value tensors, not stored copies of your prompts. It is transient and node-local: entries are evicted when higher-priority entries need the space, and the entire cache is destroyed when the node is shut down (our nodes scale to zero when idle). Cache data never leaves the worker node, is never backed up, and is used only to serve subsequent requests faster. The cached_tokens count we report is derived from this cache.

Metadata retention

We retain only operational metadata for each request — token counts (prompt, completion, and cached), model identifier, latency, request outcome, and timestamp — for up to 30 days, for billing reconciliation and abuse prevention. This metadata contains no prompt or completion content. After 30 days it is deleted or aggregated.

Processing locations

Inference, including all cache tiers, runs only in United States and European Union data-center regions. No request content is processed or cached outside the US/EU.

Sharing

We do not sell data, and we do not share your request content with any third party for their own purposes. The only subprocessor that touches inference traffic is the GPU infrastructure on which our worker nodes run; it does not access your content for its own use.

Contact

Questions about how we handle data can be sent to [email protected].