Requesty
Data/Latency and performance

Latency leaderboard per provider, April 2026

Latency leaderboard, April 2026 (top 10 providers by volume)

Switch between p50, p95 and time-to-first-token. Hover any row for all three.

Median time from request to last token, successful only.Latency only counted on successful requests. TTFT only counted on streamed-and-successful requests.

Which AI provider has the lowest latency in April 2026? On the Requesty gateway xAI led p50 at 0.6 s, with Novita (0.8 s), Azure (1.0 s) and Mistral (1.4 s) close behind. Vertex (Claude) was the slowest at 13.7 s, 23× the fastest and 2.8× slower than Vertex (Gemini) at 4.9 s on the same Vertex route. Anthropic-direct sat mid-pack at 5.8 s with a 52.6 s p95 long tail.

Why it mattersTotal p50 latency is dominated by workload type, not pure provider speed. The 23× spread is partly silicon, partly streaming behaviour, but mostly the size and tool-call complexity of requests being sent. The Vertex-Claude tail is heavy agentic Claude Code traffic, not slow inference. Reading the leaderboard literally without that context will mislead any provider-selection decision.

Period
Apr 2026
Updated
May 9, 2026
ID
latency-leaderboard-april-2026
§ 01

Key findings

  • 01p50 spans 23× from fastest to slowest: xAI 0.6 s to Vertex (Claude) 13.7 s.
  • 02Fast tier: xAI (0.6 s), Novita (0.8 s), Azure (1.0 s), Mistral (1.4 s).
  • 03Vertex split is striking: Vertex (Gemini) 4.9 s, Vertex (Claude) 13.7 s. Same provider routing, very different workload weight.
  • 04Frontier-Claude tier: Anthropic 5.8 s, with long-tail variance Anthropic p95 52.6 s, DeepSeek p95 74.0 s.
  • 05TTFT is decoupled. Azure is fastest to first token (0.6 s) despite a 1.0 s total p50.
  • 06xAI: fast on total but slow to first token (3.27 s TTFT). Suggests buffered or non-streaming upstream behaviour.
§ 02

Data

Providerp50 latency(milliseconds)p95 latency(milliseconds)p50 TTFT(milliseconds)
xAI600 ms10.9 s3.27 s
Novita800 ms18.5 s3.10 s
Azure1.00 s8.80 s600 ms
Mistral1.40 s9.80 s1.01 s
OpenAI2.50 s17.9 s1.84 s
Bedrock2.80 s23.8 s1.86 s
Vertex (Gemini)4.90 s27.2 s1.28 s
Anthropic5.80 s52.6 s2.14 s
Moonshot5.90 s64.1 s2.62 s
DeepSeek9.00 s74.0 s1.17 s
Vertex (Claude)13.7 s115.2 s1.44 s
§ 03

Cite as

APA
Click to copy
BibTeX
Click to copy
§ 04

Cited in

ID: latency-leaderboard-april-2026·Updated May 9, 2026·Period Apr 2026