Requesty
Data/Latency and performance

p50 latency YoY: April 2025 vs April 2026

p50 latency YoY: April 2025 to April 2026

Same provider tag, ≥50k requests in both months. Lower bars = faster requests.

Apr 2025Apr 2026Faster YoY
Apr 2025Apr 2026
YoY
Open-source aggregator routes (xAI, DeepInfra, Alibaba, Novita, Nebius) compressed by 89-93%. Frontier providers were already fast and barely moved.`successful` flag semantics likely changed between 2025 and 2026; latency YoY is robust because it's quantiles over wall-clock duration and not affected by that.

Has LLM latency improved over the past year? On the Requesty gateway, open-source aggregator routes compressed dramatically between April 2025 and April 2026. xAI fell 93% (9.1 s to 0.6 s), DeepInfra 91% (15.8 s to 1.4 s), DeepSeek 62% (24.3 s to 9.2 s). Frontier providers barely moved (OpenAI -5%, Anthropic 0%). Vertex (Claude) is the only major route that got slower, +131%, as heavy agentic Claude Code workloads landed on it.

Why it mattersThe OSS-aggregator tier closed most of the latency gap to frontier providers in 12 months: routing easy work onto a cheap OSS path used to cost 5-25 seconds and now costs sub-second. Workload composition is the dominant force on aggregate latency. Vertex (Claude) getting 2.3× slower while the underlying inference stack barely changed shows that "is provider X fast?" is the wrong question to ask in isolation.

Period
Apr 2025 to Apr 2026
Updated
May 9, 2026
ID
latency-yoy-april-2026
§ 01

Key findings

  • 01OSS aggregator routes (xAI, DeepInfra, Alibaba, Novita, Nebius) compressed 89-93% YoY.
  • 02xAI: 9.1 s to 0.6 s (-93%). DeepInfra: 15.8 s to 1.4 s (-91%).
  • 03DeepSeek: 24.3 s to 9.2 s (-62%). Still slow but dramatically faster.
  • 04Frontier providers barely moved: OpenAI -5%, Anthropic 0%.
  • 05Vertex (Claude) is the lone exception: 6.0 s to 13.8 s (+131%). The route stayed put while heavy agentic Claude Code workloads moved onto it, so the work itself got bigger.
  • 06Practical implication: routing easy work to a cheap OSS path used to cost 5-25 seconds, now costs sub-second.
§ 02

Data

ProviderApr 2025 p50(milliseconds)Apr 2026 p50(milliseconds)YoY delta(percent)
xAI9.10 s600 ms-93.00%
DeepInfra15.8 s1.40 s-91.00%
Alibaba5.80 s500 ms-91.00%
Novita8.80 s800 ms-91.00%
Nebius22.1 s2.30 s-89.00%
DeepSeek24.3 s9.20 s-62.00%
Coding7.90 s6.10 s-23.00%
OpenAI2.60 s2.50 s-5.00%
Anthropic5.90 s5.90 s0.00%
Vertex (Claude)6.00 s13.8 s131.00%
§ 03

Cite as

APA
Click to copy
BibTeX
Click to copy
§ 04

Cited in

ID: latency-yoy-april-2026·Updated May 9, 2026·Period Apr 2025 to Apr 2026