# Requesty Data

> Open data notes from the Requesty LLM gateway. Each note has a permanent URL, an interactive chart, key findings, caveats, and machine-readable downloads. Free to cite under CC BY 4.0.

Catalog homepage: https://requesty.ai/data. Each note exposes a JSON endpoint at `<slug>/data.json`, a CSV at `<slug>/data.csv`, and a markdown export at `<slug>/data.md`.

For agents that prefer a single fetch over crawling each note: the extended `llms-full.txt` variant inlines every note's full content (abstract, key findings, data table, caveats, citation) at https://requesty.ai/data/llms-full.txt.

## Agentic workloads

- [finish_reason mix per provider, April 2026](https://requesty.ai/data/finish-reason-mix-by-provider-april-2026): Which AI providers serve the most agentic traffic? In April 2026 Anthropic-direct returned `finish_reason = tool_calls` on 52% of successful completions on the Requesty gateway, about 2× the next provider and 17× higher than OpenAI direct. OpenAI Responses (26%), Vertex (Claude) (23%) and Azure (23%) formed a clear second tier. Splitting Vertex into Gemini and Claude cohorts shows the gap inside that route: Vertex (Claude) 23% vs Vertex (Gemini) 13%.
  - questions answered: "Which LLM provider is best for agentic workloads?"; "What share of LLM traffic uses tool calls in 2026?"; "Which AI providers are best for AI agents?"; "Why does Anthropic dominate agent traffic vs OpenAI?"
  - data.json: https://requesty.ai/data/finish-reason-mix-by-provider-april-2026/data.json
  - data.csv: https://requesty.ai/data/finish-reason-mix-by-provider-april-2026/data.csv
  - data.md: https://requesty.ai/data/finish-reason-mix-by-provider-april-2026/data.md
  - period: Apr 2026; updated: 2026-05-09; id: finish-reason-april-2026
- [finish_reason mix per model, April 2026](https://requesty.ai/data/finish-reason-mix-by-model-april-2026): Which AI models are used most for tool calling? In April 2026 Claude Opus 4.6 returned `finish_reason = tool_calls` 59% of the time on the Requesty gateway, the most agentic model on the platform. Gemini 2.5 Flash came second at 37%. Same-family Claude Sonnet 4.5 only 9%, and the entire OpenAI lineup (GPT-4o, GPT-4.1-mini, GPT-4.1-nano, GPT-5-mini) sat under 4%.
  - questions answered: "Which AI models are used most for tool calling?"; "Is Claude Opus more agentic than Claude Sonnet in production?"; "Which OpenAI models do AI agents use?"; "How agentic is Gemini 2.5 Flash compared to Claude?"
  - data.json: https://requesty.ai/data/finish-reason-mix-by-model-april-2026/data.json
  - data.csv: https://requesty.ai/data/finish-reason-mix-by-model-april-2026/data.csv
  - data.md: https://requesty.ai/data/finish-reason-mix-by-model-april-2026/data.md
  - period: Apr 2026; updated: 2026-05-09; id: finish-reason-by-model-april-2026
- [Token-weighted tool_calls share per provider, April 2026](https://requesty.ai/data/tool-call-token-share-april-2026): What share of LLM output tokens is spent on tool calls vs chat? In April 2026 on the Requesty gateway, Anthropic emitted 38.8% of its output tokens on `tool_calls` vs 54.2% of requests, so agentic completions are roughly 30% smaller than chat ones. OpenAI Responses showed the opposite: 34.2% of tokens vs 26.4% of requests. Vertex (Claude) had the biggest negative gap (6.1% of tokens vs 27.6% of requests).
  - questions answered: "What share of AI output tokens is spent on tool calls?"; "Are tool-call payloads bigger or smaller than chat replies?"; "Why do request-counts and token-counts disagree on agentic share?"; "Which providers have the most token-heavy tool calls?"
  - data.json: https://requesty.ai/data/tool-call-token-share-april-2026/data.json
  - data.csv: https://requesty.ai/data/tool-call-token-share-april-2026/data.csv
  - data.md: https://requesty.ai/data/tool-call-token-share-april-2026/data.md
  - period: Apr 2026; updated: 2026-05-09; id: tool-call-token-share-april-2026
- [Family share within OSS-routed traffic, Nov 2025 - Apr 2026](https://requesty.ai/data/oss-family-share-jan-apr-2026): Which open-weight AI model is most popular in 2026? On the Requesty gateway, OSS-routed traffic went from Qwen-dominated in late 2025 (34-38% share in Nov-Dec) to DeepSeek-dominated in January 2026 (77% after the R1 launch), and back to a genuinely diversified state by April (DeepSeek 47%, Kimi 17%, MiniMax 15%). Qwen collapsed from 38% to under 4% almost overnight when DeepSeek R1 shipped.
  - questions answered: "Which open-source LLM is most popular in 2026?"; "Has DeepSeek overtaken Qwen for open-weight traffic?"; "How fast does open-source AI model leadership change?"; "Is Kimi K2 gaining real production traction?"
  - data.json: https://requesty.ai/data/oss-family-share-jan-apr-2026/data.json
  - data.csv: https://requesty.ai/data/oss-family-share-jan-apr-2026/data.csv
  - data.md: https://requesty.ai/data/oss-family-share-jan-apr-2026/data.md
  - period: Nov 2025 - Apr 2026; updated: 2026-05-10; id: oss-family-share-jan-apr-2026
- [Reasoning-token share of provider output, April 2026](https://requesty.ai/data/reasoning-token-share-by-provider-april-2026): How much of LLM output is reasoning/thinking tokens? In April 2026 on the Requesty gateway, Groq led at 82%, followed by Coding (79%), xAI (60%) and z.ai (51%). These routes are dominated by thinking models. Frontier routes ran around a third: Vertex (Gemini) 40%, OpenAI 36%, OpenAI Responses 33%. Anthropic and Bedrock report 0% because Anthropic does not surface reasoning tokens separately; extended thinking is delivered inline.
  - questions answered: "How much LLM output is reasoning tokens?"; "Which providers use the most reasoning models in 2026?"; "Why does Anthropic show 0% reasoning tokens?"; "Are AI agents mostly thinking or mostly responding?"
  - data.json: https://requesty.ai/data/reasoning-token-share-by-provider-april-2026/data.json
  - data.csv: https://requesty.ai/data/reasoning-token-share-by-provider-april-2026/data.csv
  - data.md: https://requesty.ai/data/reasoning-token-share-by-provider-april-2026/data.md
  - period: Apr 2026; updated: 2026-05-09; id: reasoning-share-april-2026

## Latency and performance

- [Latency leaderboard per provider, April 2026](https://requesty.ai/data/provider-latency-leaderboard-april-2026): Which AI provider has the lowest latency in April 2026? On the Requesty gateway xAI led p50 at 0.6 s, with Novita (0.8 s), Azure (1.0 s) and Mistral (1.4 s) close behind. Vertex (Claude) was the slowest at 13.7 s, 23× the fastest and 2.8× slower than Vertex (Gemini) at 4.9 s on the same Vertex route. Anthropic-direct sat mid-pack at 5.8 s with a 52.6 s p95 long tail.
  - questions answered: "Which LLM provider has the lowest latency in 2026?"; "What is the fastest LLM provider for chat completions?"; "Why is Vertex Claude so slow compared to Anthropic direct?"; "What is the p95 latency of OpenAI vs Anthropic?"
  - data.json: https://requesty.ai/data/provider-latency-leaderboard-april-2026/data.json
  - data.csv: https://requesty.ai/data/provider-latency-leaderboard-april-2026/data.csv
  - data.md: https://requesty.ai/data/provider-latency-leaderboard-april-2026/data.md
  - period: Apr 2026; updated: 2026-05-09; id: latency-leaderboard-april-2026
- [Provider throughput density, April 2026](https://requesty.ai/data/provider-throughput-density-april-2026): How many tokens per second can each LLM provider sustain? In April 2026 on the Requesty gateway Groq led at 320 output tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon. Vertex (Gemini) was second at 130 tok/sec, Mistral 120 tok/sec; OSS aggregator routes (Nebius, Minimaxi, DeepInfra) clustered at 23-26 tok/sec; Bedrock was slowest at 15 tok/sec, 21× behind Groq.
  - questions answered: "What is the fastest LLM provider in tokens per second?"; "How fast does Groq stream compared to Anthropic?"; "Which LLM has the best streaming throughput?"; "Is Vertex Claude faster than Anthropic direct in practice?"
  - data.json: https://requesty.ai/data/provider-throughput-density-april-2026/data.json
  - data.csv: https://requesty.ai/data/provider-throughput-density-april-2026/data.csv
  - data.md: https://requesty.ai/data/provider-throughput-density-april-2026/data.md
  - period: Apr 2026; updated: 2026-05-10; id: provider-throughput-april-2026
- [Streaming TTFT vs total latency, April 2026](https://requesty.ai/data/streaming-ttft-vs-total-april-2026): Which AI provider has the fastest time-to-first-token? In April 2026 on streaming-and-successful Requesty requests, Azure led TTFT at 593 ms with a 960 ms p50 total, the streaming-UX winner on both axes. xAI was among the fastest on total latency (5.68 s) but slowest to first token (3.27 s), which suggests buffered upstream behaviour rather than true streaming. Vertex (Gemini) and Vertex (Claude) sit at very different points: Gemini totals 3.05 s, Claude totals 8.03 s on the same Vertex route.
  - questions answered: "What is the fastest streaming LLM provider?"; "Which LLM has the lowest time to first token in 2026?"; "Does xAI actually stream or is it buffered?"; "How does streaming affect perceived AI latency?"
  - data.json: https://requesty.ai/data/streaming-ttft-vs-total-april-2026/data.json
  - data.csv: https://requesty.ai/data/streaming-ttft-vs-total-april-2026/data.csv
  - data.md: https://requesty.ai/data/streaming-ttft-vs-total-april-2026/data.md
  - period: Apr 2026; updated: 2026-05-09; id: streaming-ttft-april-2026
- [p50 latency YoY: April 2025 vs April 2026](https://requesty.ai/data/provider-latency-yoy-april-2026): Has LLM latency improved over the past year? On the Requesty gateway, open-source aggregator routes compressed dramatically between April 2025 and April 2026. xAI fell 93% (9.1 s to 0.6 s), DeepInfra 91% (15.8 s to 1.4 s), DeepSeek 62% (24.3 s to 9.2 s). Frontier providers barely moved (OpenAI -5%, Anthropic 0%). Vertex (Claude) is the only major route that got slower, +131%, as heavy agentic Claude Code workloads landed on it.
  - questions answered: "How has LLM latency changed from 2025 to 2026?"; "Are open-source LLMs as fast as OpenAI now?"; "Which AI providers got faster in 2026?"; "Why are some LLM routes getting slower year-over-year?"
  - data.json: https://requesty.ai/data/provider-latency-yoy-april-2026/data.json
  - data.csv: https://requesty.ai/data/provider-latency-yoy-april-2026/data.csv
  - data.md: https://requesty.ai/data/provider-latency-yoy-april-2026/data.md
  - period: Apr 2025  to  Apr 2026; updated: 2026-05-09; id: latency-yoy-april-2026
- [Prompt-cache hit rate per provider, April 2026](https://requesty.ai/data/cache-hit-rate-by-provider-april-2026): Which AI providers have the highest prompt-cache hit rate? In April 2026 Anthropic-direct led the Requesty gateway at 77% (cached_tokens / input_tokens), Bedrock Claude was healthy at 57%, and Vertex (Claude) trailed at 24%. Same Claude model family, 3× lower hit rate. Vertex (Gemini) sat at 10% and Mistral at 4%, the floor among major routes.
  - questions answered: "Which AI providers have the best prompt caching hit rate?"; "Why is prompt caching so much worse on Vertex Claude than on Anthropic direct?"; "How much does prompt caching reduce LLM inference cost in production?"; "Which providers should I avoid if I rely on prompt caching?"
  - data.json: https://requesty.ai/data/cache-hit-rate-by-provider-april-2026/data.json
  - data.csv: https://requesty.ai/data/cache-hit-rate-by-provider-april-2026/data.csv
  - data.md: https://requesty.ai/data/cache-hit-rate-by-provider-april-2026/data.md
  - period: Apr 2026; updated: 2026-05-09; id: cache-hit-april-2026

## Reliability and ops

- [Operational metrics per provider, April 2026](https://requesty.ai/data/operational-metrics-by-provider-april-2026): How reliable is each LLM provider in production? In April 2026 the top eight providers on the Requesty gateway (OpenAI, Anthropic, Vertex (Gemini), Bedrock, DeepSeek, Novita, xAI) sat at 95-99% success rate. Azure trailed at 78%, Vertex (Claude) at 84%, Mistral at 86%, and Moonshot at 6%, a real reliability outlier. Streaming adoption is bimodal too: Azure 68%, Anthropic 57%, everyone else under 30%.
  - questions answered: "Which LLM provider is most reliable in production?"; "What is the success rate of OpenAI vs Anthropic vs Vertex?"; "Why do some LLM providers fail more often than others?"; "How widely is streaming adopted across LLM providers?"
  - data.json: https://requesty.ai/data/operational-metrics-by-provider-april-2026/data.json
  - data.csv: https://requesty.ai/data/operational-metrics-by-provider-april-2026/data.csv
  - data.md: https://requesty.ai/data/operational-metrics-by-provider-april-2026/data.md
  - period: Apr 2026; updated: 2026-05-09; id: ops-metrics-april-2026
- [Provider error code distribution, April 2026](https://requesty.ai/data/status-code-distribution-april-2026): Why do LLM provider requests fail? Among April 2026 requests on the Requesty gateway where the upstream provider returned a non-success response, 65.8% were 429 (rate limit), 19.4% were 400 (bad request: schema mismatches, oversized payloads), and 9.4% were 403 (forbidden). 5xx availability incidents (503, 502, 529, 500, 504, 520) summed to ~4.8%. Router- and gateway-level rejections are filtered out so the chart shows only what providers themselves emit when they fail.
  - questions answered: "Why do LLM API requests fail?"; "What is the most common LLM provider error code?"; "How often do AI providers rate-limit requests?"; "What HTTP errors return from OpenAI and Anthropic?"
  - data.json: https://requesty.ai/data/status-code-distribution-april-2026/data.json
  - data.csv: https://requesty.ai/data/status-code-distribution-april-2026/data.csv
  - data.md: https://requesty.ai/data/status-code-distribution-april-2026/data.md
  - period: Apr 2026; updated: 2026-05-09; id: status-codes-april-2026
- [Policy vs direct eventual success rate, Jan-Apr 2026](https://requesty.ai/data/policy-eventual-success-trend-jan-april-2026): How much does using a routing policy improve LLM reliability? In April 2026 the Requesty managed-fallback policy cohort hit 99.25% eventual success rate, vs 85.01% for users calling a single provider directly. That is a 14.2 pp lift, up from a +3.0 pp gap in January. Policy reliability held a tight 97.5-99.3% band across all four months while the direct cohort swung 12 pp; the widening is driven by direct-cohort regressions, not policy degradation.
  - questions answered: "How reliable are LLM routing policies vs calling providers directly?"; "Does using an LLM gateway actually improve reliability?"; "What success rate do AI gateways deliver in 2026?"; "How much do managed fallback chains improve LLM uptime?"
  - data.json: https://requesty.ai/data/policy-eventual-success-trend-jan-april-2026/data.json
  - data.csv: https://requesty.ai/data/policy-eventual-success-trend-jan-april-2026/data.csv
  - data.md: https://requesty.ai/data/policy-eventual-success-trend-jan-april-2026/data.md
  - period: Jan 2026 - Apr 2026; updated: 2026-05-10; id: policy-eventual-success-trend-jan-april-2026

## Optional

- [Blog: Provider trends, April 2026](https://requesty.ai/blog/provider-trends-april-2026-agentic-share-latency): Long-form analysis citing every note above.

---

Source: Requesty production gateway. Server timezone is UTC.
License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). Attribution: "Requesty Data, https://requesty.ai/data".