2024-10-01 · OpenAI

Prompt Caching in the API

pricinginfrastructure

read at source ↗ openai.com

Prompt Caching in the API

Source: OpenAI Date: 2024-10-01 URL: https://openai.com/index/api-prompt-caching

Summary

Summary

OpenAI launched prompt caching in the API in October 2024 — a feature that automatically caches frequently repeated prompt prefixes (system prompts, documents, few-shot examples) and charges reduced rates for cache hits. Applications that repeatedly send the same long context prefix benefit significantly from reduced latency and cost.

Implications

Platform/developer thread. Prompt caching is a significant cost reduction for production applications: many enterprise deployments use long system prompts, documents, or context windows that are identical across requests. Without caching, each call re-processes the full context; with caching, the shared prefix is computed once. Anthropic had launched a similar feature earlier in 2024, making this a competitive parity move. The implications for application architecture are meaningful: caching encourages keeping large context in the system prompt rather than reconstructing it per call, which changes how long-context applications are designed. It also makes RAG less necessary for some use cases where the full document set fits in a cached context.

← all signals