How has LLM latency changed from 2025 to 2026?

Has LLM latency improved over the past year? On the Requesty gateway, open-source aggregator routes compressed dramatically between April 2025 and April 2026. xAI fell 93% (9.1 s to 0.6 s), DeepInfra 91% (15.8 s to 1.4 s), DeepSeek 62% (24.3 s to 9.2 s). Frontier providers barely moved (OpenAI -5%, Anthropic 0%). Vertex (Claude) is the only major route that got slower, +131%, as heavy agentic Claude Code workloads landed on it. The OSS-aggregator tier closed most of the latency gap to frontier providers in 12 months: routing easy work onto a cheap OSS path used to cost 5-25 seconds and now costs sub-second. Workload composition is the dominant force on aggregate latency. Vertex (Claude) getting 2.3× slower while the underlying inference stack barely changed shows that "is provider X fast?" is the wrong question to ask in isolation.

Are open-source LLMs as fast as OpenAI now?

Has LLM latency improved over the past year? On the Requesty gateway, open-source aggregator routes compressed dramatically between April 2025 and April 2026. xAI fell 93% (9.1 s to 0.6 s), DeepInfra 91% (15.8 s to 1.4 s), DeepSeek 62% (24.3 s to 9.2 s). Frontier providers barely moved (OpenAI -5%, Anthropic 0%). Vertex (Claude) is the only major route that got slower, +131%, as heavy agentic Claude Code workloads landed on it. The OSS-aggregator tier closed most of the latency gap to frontier providers in 12 months: routing easy work onto a cheap OSS path used to cost 5-25 seconds and now costs sub-second. Workload composition is the dominant force on aggregate latency. Vertex (Claude) getting 2.3× slower while the underlying inference stack barely changed shows that "is provider X fast?" is the wrong question to ask in isolation.

Which AI providers got faster in 2026?

Has LLM latency improved over the past year? On the Requesty gateway, open-source aggregator routes compressed dramatically between April 2025 and April 2026. xAI fell 93% (9.1 s to 0.6 s), DeepInfra 91% (15.8 s to 1.4 s), DeepSeek 62% (24.3 s to 9.2 s). Frontier providers barely moved (OpenAI -5%, Anthropic 0%). Vertex (Claude) is the only major route that got slower, +131%, as heavy agentic Claude Code workloads landed on it. The OSS-aggregator tier closed most of the latency gap to frontier providers in 12 months: routing easy work onto a cheap OSS path used to cost 5-25 seconds and now costs sub-second. Workload composition is the dominant force on aggregate latency. Vertex (Claude) getting 2.3× slower while the underlying inference stack barely changed shows that "is provider X fast?" is the wrong question to ask in isolation.

Why are some LLM routes getting slower year-over-year?

Has LLM latency improved over the past year? On the Requesty gateway, open-source aggregator routes compressed dramatically between April 2025 and April 2026. xAI fell 93% (9.1 s to 0.6 s), DeepInfra 91% (15.8 s to 1.4 s), DeepSeek 62% (24.3 s to 9.2 s). Frontier providers barely moved (OpenAI -5%, Anthropic 0%). Vertex (Claude) is the only major route that got slower, +131%, as heavy agentic Claude Code workloads landed on it. The OSS-aggregator tier closed most of the latency gap to frontier providers in 12 months: routing easy work onto a cheap OSS path used to cost 5-25 seconds and now costs sub-second. Workload composition is the dominant force on aggregate latency. Vertex (Claude) getting 2.3× slower while the underlying inference stack barely changed shows that "is provider X fast?" is the wrong question to ask in isolation.

Data/Latency and performance/Apr 2025 to Apr 2026

p50 latency YoY: April 2025 vs April 2026

Name: p50 latency YoY: April 2025 vs April 2026
Creator: Requesty
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Latency and performance, LLM, gateway, provider, metrics, How has LLM latency changed from 2025 to 2026?, Are open-source LLMs as fast as OpenAI now?, Which AI providers got faster in 2026?, Why are some LLM routes getting slower year-over-year?

Has LLM latency improved over the past year? On the Requesty gateway, open-source aggregator routes compressed dramatically between April 2025 and April 2026. xAI fell 93% (9.1 s to 0.6 s), DeepInfra 91% (15.8 s to 1.4 s), DeepSeek 62% (24.3 s to 9.2 s). Frontier providers barely moved (OpenAI -5%, Anthropic 0%). Vertex (Claude) is the only major route that got slower, +131%, as heavy agentic Claude Code workloads landed on it.

Why it mattersThe OSS-aggregator tier closed most of the latency gap to frontier providers in 12 months: routing easy work onto a cheap OSS path used to cost 5-25 seconds and now costs sub-second. Workload composition is the dominant force on aggregate latency. Vertex (Claude) getting 2.3× slower while the underlying inference stack barely changed shows that "is provider X fast?" is the wrong question to ask in isolation.

Period

Apr 2025 to Apr 2026

Updated

May 9, 2026

ID

latency-yoy-april-2026

§ 01

Key findings

01OSS aggregator routes (xAI, DeepInfra, Alibaba, Novita, Nebius) compressed 89-93% YoY.
02xAI: 9.1 s to 0.6 s (-93%). DeepInfra: 15.8 s to 1.4 s (-91%).
03DeepSeek: 24.3 s to 9.2 s (-62%). Still slow but dramatically faster.
04Frontier providers barely moved: OpenAI -5%, Anthropic 0%.
05Vertex (Claude) is the lone exception: 6.0 s to 13.8 s (+131%). The route stayed put while heavy agentic Claude Code workloads moved onto it, so the work itself got bigger.
06Practical implication: routing easy work to a cheap OSS path used to cost 5-25 seconds, now costs sub-second.

§ 02

Data

Provider	Apr 2025 p50(milliseconds)	Apr 2026 p50(milliseconds)	YoY delta(percent)
xAI	9.10 s	600 ms	-93.00%
DeepInfra	15.8 s	1.40 s	-91.00%
Alibaba	5.80 s	500 ms	-91.00%
Novita	8.80 s	800 ms	-91.00%
Nebius	22.1 s	2.30 s	-89.00%
DeepSeek	24.3 s	9.20 s	-62.00%
Coding	7.90 s	6.10 s	-23.00%
OpenAI	2.60 s	2.50 s	-5.00%
Anthropic	5.90 s	5.90 s	0.00%
Vertex (Claude)	6.00 s	13.8 s	131.00%

§ 03

Cite as

APA

Click to copy

BibTeX