p50 latency YoY: April 2025 vs April 2026
p50 latency YoY: April 2025 to April 2026
Same provider tag, ≥50k requests in both months. Lower bars = faster requests.
Has LLM latency improved over the past year? On the Requesty gateway, open-source aggregator routes compressed dramatically between April 2025 and April 2026. xAI fell 93% (9.1 s to 0.6 s), DeepInfra 91% (15.8 s to 1.4 s), DeepSeek 62% (24.3 s to 9.2 s). Frontier providers barely moved (OpenAI -5%, Anthropic 0%). Vertex (Claude) is the only major route that got slower, +131%, as heavy agentic Claude Code workloads landed on it.
Why it mattersThe OSS-aggregator tier closed most of the latency gap to frontier providers in 12 months: routing easy work onto a cheap OSS path used to cost 5-25 seconds and now costs sub-second. Workload composition is the dominant force on aggregate latency. Vertex (Claude) getting 2.3× slower while the underlying inference stack barely changed shows that "is provider X fast?" is the wrong question to ask in isolation.
Key findings
- 01OSS aggregator routes (xAI, DeepInfra, Alibaba, Novita, Nebius) compressed 89-93% YoY.
- 02xAI: 9.1 s to 0.6 s (-93%). DeepInfra: 15.8 s to 1.4 s (-91%).
- 03DeepSeek: 24.3 s to 9.2 s (-62%). Still slow but dramatically faster.
- 04Frontier providers barely moved: OpenAI -5%, Anthropic 0%.
- 05Vertex (Claude) is the lone exception: 6.0 s to 13.8 s (+131%). The route stayed put while heavy agentic Claude Code workloads moved onto it, so the work itself got bigger.
- 06Practical implication: routing easy work to a cheap OSS path used to cost 5-25 seconds, now costs sub-second.
Data
| Provider | Apr 2025 p50(milliseconds) | Apr 2026 p50(milliseconds) | YoY delta(percent) |
|---|---|---|---|
| xAI | 9.10 s | 600 ms | -93.00% |
| DeepInfra | 15.8 s | 1.40 s | -91.00% |
| Alibaba | 5.80 s | 500 ms | -91.00% |
| Novita | 8.80 s | 800 ms | -91.00% |
| Nebius | 22.1 s | 2.30 s | -89.00% |
| DeepSeek | 24.3 s | 9.20 s | -62.00% |
| Coding | 7.90 s | 6.10 s | -23.00% |
| OpenAI | 2.60 s | 2.50 s | -5.00% |
| Anthropic | 5.90 s | 5.90 s | 0.00% |
| Vertex (Claude) | 6.00 s | 13.8 s | 131.00% |
Cite as
Cited in
- What the gateway saw in April 2026/blog/provider-trends-april-2026-agentic-share-latency
