Requesty
Data/Latency and performance

Streaming TTFT vs total latency, April 2026

Streaming TTFT vs total latency, April 2026

Streamed-and-successful requests only. Sorted ascending. fastest at the top.

Time-to-first-token, median. The latency users perceive on streaming responses.TTFT (first_token_latency_ns) was not populated before 2026; only April 2026 included.

Which AI provider has the fastest time-to-first-token? In April 2026 on streaming-and-successful Requesty requests, Azure led TTFT at 593 ms with a 960 ms p50 total, the streaming-UX winner on both axes. xAI was among the fastest on total latency (5.68 s) but slowest to first token (3.27 s), which suggests buffered upstream behaviour rather than true streaming. Vertex (Gemini) and Vertex (Claude) sit at very different points: Gemini totals 3.05 s, Claude totals 8.03 s on the same Vertex route.

Why it mattersTime-to-first-token is what users actually feel as latency in chat UIs. A 600 ms TTFT feels instantaneous; a 3 s TTFT feels broken even if total latency is the same. Buffered streaming masquerading as real streaming is a common antipattern in this dataset, and any latency benchmark that only quotes total p50 will miss it entirely.

Period
Apr 2026
Updated
May 9, 2026
ID
streaming-ttft-april-2026
§ 01

Key findings

  • 01Azure: 593 ms p50 TTFT, 960 ms p50 total. The streaming-UX winner on both axes.
  • 02Nebius (659 ms TTFT) and OpenAI Responses (731 ms) are also strong on first-token speed.
  • 03Vertex (Gemini) 1.29 s TTFT vs Vertex (Claude) 1.44 s TTFT. Gemini totals 3.05 s, Claude totals 8.03 s. The Claude variant carries the heavy agentic completions on this route.
  • 04xAI: 5.68 s p50 total with 3.27 s TTFT. suggests upstream buffers responses before flushing rather than true streaming.
  • 05Anthropic: 2.14 s TTFT, 5.87 s total. slowest first byte among the very large providers, but consistent shape.
§ 02

Data

Providerp50 TTFT(milliseconds)p50 total(milliseconds)p95 TTFT(milliseconds)p95 total(milliseconds)
Alibaba235 ms1.03 s4.82 s13.4 s
Azure593 ms960 ms1.32 s3.35 s
Nebius659 ms4.14 s4.21 s41.1 s
OpenAI Responses731 ms6.69 s2.59 s41.5 s
DeepInfra769 ms2.19 s1.26 s3.63 s
Mistral1.01 s1.25 s5.35 s18.0 s
DeepSeek1.17 s5.29 s3.04 s31.7 s
Vertex (Gemini)1.29 s3.05 s19.6 s29.0 s
Vertex (Claude)1.44 s8.03 s4.89 s100.3 s
Bedrock1.85 s5.86 s7.72 s38.4 s
OpenAI2.00 s6.36 s15.2 s26.0 s
Anthropic2.14 s5.87 s4.46 s31.9 s
Moonshot2.62 s7.49 s12.6 s52.9 s
Minimaxi2.77 s6.14 s7.27 s24.7 s
Novita3.13 s7.42 s9.67 s27.9 s
xAI3.27 s5.67 s14.8 s20.9 s
§ 03

Cite as

APA
Click to copy
BibTeX
Click to copy
ID: streaming-ttft-april-2026·Updated May 9, 2026·Period Apr 2026