Prompt-cache hit rate per provider, April 2026
Cache hit rate per provider, April 2026
cached_tokens divided by input_tokens. Higher is cheaper and faster.
Which AI providers have the highest prompt-cache hit rate? In April 2026 Anthropic-direct led the Requesty gateway at 77% (cached_tokens / input_tokens), Bedrock Claude was healthy at 57%, and Vertex (Claude) trailed at 24%. Same Claude model family, 3× lower hit rate. Vertex (Gemini) sat at 10% and Mistral at 4%, the floor among major routes.
Why it mattersPrompt caching directly cuts the per-request cost of long, repeated context. The difference between a 77% hit rate and a 24% hit rate on the same model family is roughly a 3× reduction in input tokens billed at full price. The Vertex-Claude gap looks like a configuration issue rather than a platform limitation, which means Claude users on Vertex are leaving substantial savings on the table without a code change.
Key findings
- 01Anthropic-direct: 77% cache hit, the leader by a wide margin.
- 02Bedrock Claude: 57%. OpenAI: 36%. DeepSeek: 48%. Healthy.
- 03Vertex (Claude): 24%. Same model as Anthropic-direct (77%) and Bedrock (57%), 3× lower hit rate. Configuration gap.
- 04Vertex (Gemini): 10%. The floor among major routes.
- 05Mistral: 4%. Roughly the floor; prompt caching is not a meaningful lever on that route today.
- 06Moonshot reports 88% but it is a measurement artefact at 6% success rate; do not quote it.
Data
| Provider | Cache hit rate(percent) |
|---|---|
| Anthropic | 77.50% |
| Bedrock | 56.90% |
| DeepSeek | 48.30% |
| Azure | 41.00% |
| OpenAI | 36.40% |
| xAI | 35.70% |
| Novita | 31.90% |
| Vertex (Claude) | 23.50% |
| Vertex (Gemini) | 9.60% |
| Mistral | 4.10% |
Cite as
Cited in
- What the gateway saw in April 2026/blog/provider-trends-april-2026-agentic-share-latency
