Data/Latency and performance/Apr 2026

Latency leaderboard per provider, April 2026

Name: Latency leaderboard per provider, April 2026
Creator: Requesty
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Latency and performance, LLM, gateway, provider, metrics, Which LLM provider has the lowest latency in 2026?, What is the fastest LLM provider for chat completions?, Why is Vertex Claude so slow compared to Anthropic direct?, What is the p95 latency of OpenAI vs Anthropic?

Which AI provider has the lowest latency in April 2026? On the Requesty gateway xAI led p50 at 0.6 s, with Novita (0.8 s), Azure (1.0 s) and Mistral (1.4 s) close behind. Vertex (Claude) was the slowest at 13.7 s, 23× the fastest and 2.8× slower than Vertex (Gemini) at 4.9 s on the same Vertex route. Anthropic-direct sat mid-pack at 5.8 s with a 52.6 s p95 long tail.

Why it mattersTotal p50 latency is dominated by workload type, not pure provider speed. The 23× spread is partly silicon, partly streaming behaviour, but mostly the size and tool-call complexity of requests being sent. The Vertex-Claude tail is heavy agentic Claude Code traffic, not slow inference. Reading the leaderboard literally without that context will mislead any provider-selection decision.

Period

Apr 2026

Updated

May 9, 2026

ID

latency-leaderboard-april-2026

§ 01

Key findings

01p50 spans 23× from fastest to slowest: xAI 0.6 s to Vertex (Claude) 13.7 s.
02Fast tier: xAI (0.6 s), Novita (0.8 s), Azure (1.0 s), Mistral (1.4 s).
03Vertex split is striking: Vertex (Gemini) 4.9 s, Vertex (Claude) 13.7 s. Same provider routing, very different workload weight.
04Frontier-Claude tier: Anthropic 5.8 s, with long-tail variance Anthropic p95 52.6 s, DeepSeek p95 74.0 s.
05TTFT is decoupled. Azure is fastest to first token (0.6 s) despite a 1.0 s total p50.
06xAI: fast on total but slow to first token (3.27 s TTFT). Suggests buffered or non-streaming upstream behaviour.

§ 02

Data

Provider	p50 latency(milliseconds)	p95 latency(milliseconds)	p50 TTFT(milliseconds)
xAI	600 ms	10.9 s	3.27 s
Novita	800 ms	18.5 s	3.10 s
Azure	1.00 s	8.80 s	600 ms
Mistral	1.40 s	9.80 s	1.01 s
OpenAI	2.50 s	17.9 s	1.84 s
Bedrock	2.80 s	23.8 s	1.86 s
Vertex (Gemini)	4.90 s	27.2 s	1.28 s
Anthropic	5.80 s	52.6 s	2.14 s
Moonshot	5.90 s	64.1 s	2.62 s
DeepSeek	9.00 s	74.0 s	1.17 s
Vertex (Claude)	13.7 s	115.2 s	1.44 s

§ 03

Cite as

APA

Click to copy

BibTeX