Prompt Caching in the API
read at source ↗ openai.com
Prompt Caching in the API
Source: OpenAI Date: 2024-10-01 URL: https://openai.com/index/api-prompt-caching
Summary
Summary
OpenAI launched prompt caching in the API in October 2024 — a feature that automatically caches frequently repeated prompt prefixes (system prompts, documents, few-shot examples) and charges reduced rates for cache hits. Applications that repeatedly send the same long context prefix benefit significantly from reduced latency and cost.
Implications
Platform/developer thread. Prompt caching is a significant cost reduction for production applications: many enterprise deployments use long system prompts, documents, or context windows that are identical across requests. Without caching, each call re-processes the full context; with caching, the shared prefix is computed once. Anthropic had launched a similar feature earlier in 2024, making this a competitive parity move. The implications for application architecture are meaningful: caching encourages keeping large context in the system prompt rather than reconstructing it per call, which changes how long-context applications are designed. It also makes RAG less necessary for some use cases where the full document set fits in a cached context.