What is the fastest LLM provider in tokens per second?

How many tokens per second can each LLM provider sustain? In April 2026 on the Requesty gateway Groq led at 320 output tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon. Vertex (Gemini) was second at 130 tok/sec, Mistral 120 tok/sec; OSS aggregator routes (Nebius, Minimaxi, DeepInfra) clustered at 23-26 tok/sec; Bedrock was slowest at 15 tok/sec, 21× behind Groq. Throughput density (output tokens per second of total wall-clock latency) is the right number to optimise streaming UX, not raw p50 latency. Two providers with identical p50 totals can deliver wildly different perceived speed depending on token rate. Vertex (Claude) is actually faster per-token than Anthropic-direct, despite higher total latency, because Vertex Claude requests emit roughly 3× more output tokens on average.

How fast does Groq stream compared to Anthropic?

How many tokens per second can each LLM provider sustain? In April 2026 on the Requesty gateway Groq led at 320 output tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon. Vertex (Gemini) was second at 130 tok/sec, Mistral 120 tok/sec; OSS aggregator routes (Nebius, Minimaxi, DeepInfra) clustered at 23-26 tok/sec; Bedrock was slowest at 15 tok/sec, 21× behind Groq. Throughput density (output tokens per second of total wall-clock latency) is the right number to optimise streaming UX, not raw p50 latency. Two providers with identical p50 totals can deliver wildly different perceived speed depending on token rate. Vertex (Claude) is actually faster per-token than Anthropic-direct, despite higher total latency, because Vertex Claude requests emit roughly 3× more output tokens on average.

Which LLM has the best streaming throughput?

How many tokens per second can each LLM provider sustain? In April 2026 on the Requesty gateway Groq led at 320 output tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon. Vertex (Gemini) was second at 130 tok/sec, Mistral 120 tok/sec; OSS aggregator routes (Nebius, Minimaxi, DeepInfra) clustered at 23-26 tok/sec; Bedrock was slowest at 15 tok/sec, 21× behind Groq. Throughput density (output tokens per second of total wall-clock latency) is the right number to optimise streaming UX, not raw p50 latency. Two providers with identical p50 totals can deliver wildly different perceived speed depending on token rate. Vertex (Claude) is actually faster per-token than Anthropic-direct, despite higher total latency, because Vertex Claude requests emit roughly 3× more output tokens on average.

Is Vertex Claude faster than Anthropic direct in practice?

How many tokens per second can each LLM provider sustain? In April 2026 on the Requesty gateway Groq led at 320 output tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon. Vertex (Gemini) was second at 130 tok/sec, Mistral 120 tok/sec; OSS aggregator routes (Nebius, Minimaxi, DeepInfra) clustered at 23-26 tok/sec; Bedrock was slowest at 15 tok/sec, 21× behind Groq. Throughput density (output tokens per second of total wall-clock latency) is the right number to optimise streaming UX, not raw p50 latency. Two providers with identical p50 totals can deliver wildly different perceived speed depending on token rate. Vertex (Claude) is actually faster per-token than Anthropic-direct, despite higher total latency, because Vertex Claude requests emit roughly 3× more output tokens on average.

Data/Latency and performance/Apr 2026

Provider throughput density, April 2026

Name: Provider throughput density, April 2026
Creator: Requesty
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Latency and performance, LLM, gateway, provider, metrics, What is the fastest LLM provider in tokens per second?, How fast does Groq stream compared to Anthropic?, Which LLM has the best streaming throughput?, Is Vertex Claude faster than Anthropic direct in practice?

p50 of output tokens emitted per second of total wall-clock latency. Reads as 'how fast does this provider stream tokens once a request is in flight?'

Higher is faster. p50 of (output_tokens / total_latency_seconds).

Groq leads at 320 tok/sec on custom silicon — about 2.5× the next-fastest provider and 21× the slowest.Successful completions only; output_tokens > 0 and total_latency > 0. Vertex split applied; direct google traffic excluded; providers with fewer than 50,000 April requests excluded. Source: Requesty production gateway.

How many tokens per second can each LLM provider sustain? In April 2026 on the Requesty gateway Groq led at 320 output tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon. Vertex (Gemini) was second at 130 tok/sec, Mistral 120 tok/sec; OSS aggregator routes (Nebius, Minimaxi, DeepInfra) clustered at 23-26 tok/sec; Bedrock was slowest at 15 tok/sec, 21× behind Groq.

Why it mattersThroughput density (output tokens per second of total wall-clock latency) is the right number to optimise streaming UX, not raw p50 latency. Two providers with identical p50 totals can deliver wildly different perceived speed depending on token rate. Vertex (Claude) is actually faster per-token than Anthropic-direct, despite higher total latency, because Vertex Claude requests emit roughly 3× more output tokens on average.

Period

Apr 2026

Updated

May 10, 2026

ID

provider-throughput-april-2026

§ 01

Key findings

01Groq leads at 320 tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon.
02Vertex (Gemini) is second at 130 tok/sec, followed by Mistral at 120 tok/sec.
03Vertex (Claude) at 56 tok/sec is faster per-token than Anthropic-direct at 46 tok/sec, even though Vertex (Claude)'s total request latency is 2.4× higher (Vertex (Claude) requests emit ~3× more output tokens on average).
04OSS-aggregator routes (Nebius, Minimaxi, DeepInfra) cluster in the 23-26 tok/sec band.
05Bedrock is the slowest at 15 tok/sec, 21× behind Groq.

§ 02

Data

Provider	p50 tokens / sec	p50 ms / token(milliseconds)
Groq	320	3 ms
Vertex (Gemini)	130	8 ms
Mistral	120	8 ms
xAI	65	16 ms
OpenAI	57	18 ms
Novita	56	18 ms
Vertex (Claude)	56	18 ms
Anthropic	46	22 ms
OpenAI Responses	44	23 ms
Azure	39	26 ms
DeepSeek	31	32 ms
Alibaba	28	36 ms
Moonshot	27	37 ms
Nebius	26	39 ms
Minimaxi	24	41 ms
DeepInfra	24	42 ms
Bedrock	15	66 ms

§ 03

Cite as

APA

Click to copy

BibTeX

Click to copy

ID: provider-throughput-april-2026·Updated May 10, 2026·Period Apr 2026