Requesty
Data/Latency and performance

Provider throughput density, April 2026

Provider throughput density, April 2026

p50 of output tokens emitted per second of total wall-clock latency. Reads as 'how fast does this provider stream tokens once a request is in flight?'

Higher is faster. p50 of (output_tokens / total_latency_seconds).

Groq leads at 320 tok/sec on custom silicon — about 2.5× the next-fastest provider and 21× the slowest.Successful completions only; output_tokens > 0 and total_latency > 0. Vertex split applied; direct google traffic excluded; providers with fewer than 50,000 April requests excluded. Source: Requesty production gateway.

How many tokens per second can each LLM provider sustain? In April 2026 on the Requesty gateway Groq led at 320 output tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon. Vertex (Gemini) was second at 130 tok/sec, Mistral 120 tok/sec; OSS aggregator routes (Nebius, Minimaxi, DeepInfra) clustered at 23-26 tok/sec; Bedrock was slowest at 15 tok/sec, 21× behind Groq.

Why it mattersThroughput density (output tokens per second of total wall-clock latency) is the right number to optimise streaming UX, not raw p50 latency. Two providers with identical p50 totals can deliver wildly different perceived speed depending on token rate. Vertex (Claude) is actually faster per-token than Anthropic-direct, despite higher total latency, because Vertex Claude requests emit roughly 3× more output tokens on average.

Period
Apr 2026
Updated
May 10, 2026
ID
provider-throughput-april-2026
§ 01

Key findings

  • 01Groq leads at 320 tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon.
  • 02Vertex (Gemini) is second at 130 tok/sec, followed by Mistral at 120 tok/sec.
  • 03Vertex (Claude) at 56 tok/sec is faster per-token than Anthropic-direct at 46 tok/sec, even though Vertex (Claude)'s total request latency is 2.4× higher (Vertex (Claude) requests emit ~3× more output tokens on average).
  • 04OSS-aggregator routes (Nebius, Minimaxi, DeepInfra) cluster in the 23-26 tok/sec band.
  • 05Bedrock is the slowest at 15 tok/sec, 21× behind Groq.
§ 02

Data

Providerp50 tokens / secp50 ms / token(milliseconds)
Groq3203 ms
Vertex (Gemini)1308 ms
Mistral1208 ms
xAI6516 ms
OpenAI5718 ms
Novita5618 ms
Vertex (Claude)5618 ms
Anthropic4622 ms
OpenAI Responses4423 ms
Azure3926 ms
DeepSeek3132 ms
Alibaba2836 ms
Moonshot2737 ms
Nebius2639 ms
Minimaxi2441 ms
DeepInfra2442 ms
Bedrock1566 ms
§ 03

Cite as

APA
Click to copy
BibTeX
Click to copy
ID: provider-throughput-april-2026·Updated May 10, 2026·Period Apr 2026