Provider throughput density, April 2026
Provider throughput density, April 2026
p50 of output tokens emitted per second of total wall-clock latency. Reads as 'how fast does this provider stream tokens once a request is in flight?'
Higher is faster. p50 of (output_tokens / total_latency_seconds).
How many tokens per second can each LLM provider sustain? In April 2026 on the Requesty gateway Groq led at 320 output tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon. Vertex (Gemini) was second at 130 tok/sec, Mistral 120 tok/sec; OSS aggregator routes (Nebius, Minimaxi, DeepInfra) clustered at 23-26 tok/sec; Bedrock was slowest at 15 tok/sec, 21× behind Groq.
Why it mattersThroughput density (output tokens per second of total wall-clock latency) is the right number to optimise streaming UX, not raw p50 latency. Two providers with identical p50 totals can deliver wildly different perceived speed depending on token rate. Vertex (Claude) is actually faster per-token than Anthropic-direct, despite higher total latency, because Vertex Claude requests emit roughly 3× more output tokens on average.
Key findings
- 01Groq leads at 320 tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon.
- 02Vertex (Gemini) is second at 130 tok/sec, followed by Mistral at 120 tok/sec.
- 03Vertex (Claude) at 56 tok/sec is faster per-token than Anthropic-direct at 46 tok/sec, even though Vertex (Claude)'s total request latency is 2.4× higher (Vertex (Claude) requests emit ~3× more output tokens on average).
- 04OSS-aggregator routes (Nebius, Minimaxi, DeepInfra) cluster in the 23-26 tok/sec band.
- 05Bedrock is the slowest at 15 tok/sec, 21× behind Groq.
Data
| Provider | p50 tokens / sec | p50 ms / token(milliseconds) |
|---|---|---|
| Groq | 320 | 3 ms |
| Vertex (Gemini) | 130 | 8 ms |
| Mistral | 120 | 8 ms |
| xAI | 65 | 16 ms |
| OpenAI | 57 | 18 ms |
| Novita | 56 | 18 ms |
| Vertex (Claude) | 56 | 18 ms |
| Anthropic | 46 | 22 ms |
| OpenAI Responses | 44 | 23 ms |
| Azure | 39 | 26 ms |
| DeepSeek | 31 | 32 ms |
| Alibaba | 28 | 36 ms |
| Moonshot | 27 | 37 ms |
| Nebius | 26 | 39 ms |
| Minimaxi | 24 | 41 ms |
| DeepInfra | 24 | 42 ms |
| Bedrock | 15 | 66 ms |
