Data/Latency and performance/Apr 2026

Prompt-cache hit rate per provider, April 2026

Name: Prompt-cache hit rate per provider, April 2026
Creator: Requesty
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Latency and performance, LLM, gateway, provider, metrics, Which AI providers have the best prompt caching hit rate?, Why is prompt caching so much worse on Vertex Claude than on Anthropic direct?, How much does prompt caching reduce LLM inference cost in production?, Which providers should I avoid if I rely on prompt caching?

Cache hit rate per provider, April 2026

cached_tokens divided by input_tokens. Higher is cheaper and faster.

Anthropic-direct (77%) is the cache-hit leader. Vertex Claude (14%) is the surprise: same model family, ~5× lower cache hit, almost certainly a configuration gap.* Moonshot 88% is a measurement artefact: cached_tokens still records on partial streams that the gateway later marks as failed at 6% success rate.

Which AI providers have the highest prompt-cache hit rate? In April 2026 Anthropic-direct led the Requesty gateway at 77% (cached_tokens / input_tokens), Bedrock Claude was healthy at 57%, and Vertex (Claude) trailed at 24%. Same Claude model family, 3× lower hit rate. Vertex (Gemini) sat at 10% and Mistral at 4%, the floor among major routes.

Why it mattersPrompt caching directly cuts the per-request cost of long, repeated context. The difference between a 77% hit rate and a 24% hit rate on the same model family is roughly a 3× reduction in input tokens billed at full price. The Vertex-Claude gap looks like a configuration issue rather than a platform limitation, which means Claude users on Vertex are leaving substantial savings on the table without a code change.

Period

Apr 2026

Updated

May 9, 2026

ID

cache-hit-april-2026

§ 01

Key findings

01Anthropic-direct: 77% cache hit, the leader by a wide margin.
02Bedrock Claude: 57%. OpenAI: 36%. DeepSeek: 48%. Healthy.
03Vertex (Claude): 24%. Same model as Anthropic-direct (77%) and Bedrock (57%), 3× lower hit rate. Configuration gap.
04Vertex (Gemini): 10%. The floor among major routes.
05Mistral: 4%. Roughly the floor; prompt caching is not a meaningful lever on that route today.
06Moonshot reports 88% but it is a measurement artefact at 6% success rate; do not quote it.

§ 02

Data

Provider	Cache hit rate(percent)
Anthropic	77.50%
Bedrock	56.90%
DeepSeek	48.30%
Azure	41.00%
OpenAI	36.40%
xAI	35.70%
Novita	31.90%
Vertex (Claude)	23.50%
Vertex (Gemini)	9.60%
Mistral	4.10%

§ 03

Cite as

APA

Click to copy

BibTeX

Click to copy

§ 04

Cited in

What the gateway saw in April 2026/blog/provider-trends-april-2026-agentic-share-latency

ID: cache-hit-april-2026·Updated May 9, 2026·Period Apr 2026