Data/Latency and performance/Apr 2026

Streaming TTFT vs total latency, April 2026

Name: Streaming TTFT vs total latency, April 2026
Creator: Requesty
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Latency and performance, LLM, gateway, provider, metrics, What is the fastest streaming LLM provider?, Which LLM has the lowest time to first token in 2026?, Does xAI actually stream or is it buffered?, How does streaming affect perceived AI latency?

Which AI provider has the fastest time-to-first-token? In April 2026 on streaming-and-successful Requesty requests, Azure led TTFT at 593 ms with a 960 ms p50 total, the streaming-UX winner on both axes. xAI was among the fastest on total latency (5.68 s) but slowest to first token (3.27 s), which suggests buffered upstream behaviour rather than true streaming. Vertex (Gemini) and Vertex (Claude) sit at very different points: Gemini totals 3.05 s, Claude totals 8.03 s on the same Vertex route.

Why it mattersTime-to-first-token is what users actually feel as latency in chat UIs. A 600 ms TTFT feels instantaneous; a 3 s TTFT feels broken even if total latency is the same. Buffered streaming masquerading as real streaming is a common antipattern in this dataset, and any latency benchmark that only quotes total p50 will miss it entirely.

Period

Apr 2026

Updated

May 9, 2026

ID

streaming-ttft-april-2026

§ 01

Key findings

01Azure: 593 ms p50 TTFT, 960 ms p50 total. The streaming-UX winner on both axes.
02Nebius (659 ms TTFT) and OpenAI Responses (731 ms) are also strong on first-token speed.
03Vertex (Gemini) 1.29 s TTFT vs Vertex (Claude) 1.44 s TTFT. Gemini totals 3.05 s, Claude totals 8.03 s. The Claude variant carries the heavy agentic completions on this route.
04xAI: 5.68 s p50 total with 3.27 s TTFT. suggests upstream buffers responses before flushing rather than true streaming.
05Anthropic: 2.14 s TTFT, 5.87 s total. slowest first byte among the very large providers, but consistent shape.

§ 02

Data

Provider	p50 TTFT(milliseconds)	p50 total(milliseconds)	p95 TTFT(milliseconds)	p95 total(milliseconds)
Alibaba	235 ms	1.03 s	4.82 s	13.4 s
Azure	593 ms	960 ms	1.32 s	3.35 s
Nebius	659 ms	4.14 s	4.21 s	41.1 s
OpenAI Responses	731 ms	6.69 s	2.59 s	41.5 s
DeepInfra	769 ms	2.19 s	1.26 s	3.63 s
Mistral	1.01 s	1.25 s	5.35 s	18.0 s
DeepSeek	1.17 s	5.29 s	3.04 s	31.7 s
Vertex (Gemini)	1.29 s	3.05 s	19.6 s	29.0 s
Vertex (Claude)	1.44 s	8.03 s	4.89 s	100.3 s
Bedrock	1.85 s	5.86 s	7.72 s	38.4 s
OpenAI	2.00 s	6.36 s	15.2 s	26.0 s
Anthropic	2.14 s	5.87 s	4.46 s	31.9 s
Moonshot	2.62 s	7.49 s	12.6 s	52.9 s
Minimaxi	2.77 s	6.14 s	7.27 s	24.7 s
Novita	3.13 s	7.42 s	9.67 s	27.9 s
xAI	3.27 s	5.67 s	14.8 s	20.9 s

§ 03

Cite as

APA

Click to copy

BibTeX

Click to copy

ID: streaming-ttft-april-2026·Updated May 9, 2026·Period Apr 2026