# Requesty Data — Full

> Open data notes from the Requesty LLM gateway. Each note has a permanent URL, an interactive chart, key findings, caveats, and machine-readable downloads. Free to cite under CC BY 4.0.

This file is the extended `llms-full.txt` variant: it inlines the full content of every note in this catalog so an AI agent can ingest the whole hub in a single fetch. The compact link-only index is at https://www.requesty.ai/data/llms.txt; the human-friendly catalog homepage is at https://www.requesty.ai/data.

Each note also exposes a JSON endpoint at `<slug>/data.json`, a CSV at `<slug>/data.csv`, and an individual Markdown export at `<slug>/data.md`.

License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). Attribution: "Requesty Data, https://www.requesty.ai/data". Source: Requesty production gateway. Server timezone is UTC.

---

## Topic: Agentic workloads

---

# finish_reason mix per provider, April 2026

> Which AI providers serve the most agentic traffic? In April 2026 Anthropic-direct returned `finish_reason = tool_calls` on 52% of successful completions on the Requesty gateway, about 2× the next provider and 17× higher than OpenAI direct. OpenAI Responses (26%), Vertex (Claude) (23%) and Azure (23%) formed a clear second tier. Splitting Vertex into Gemini and Claude cohorts shows the gap inside that route: Vertex (Claude) 23% vs Vertex (Gemini) 13%.

*Topic: Agentic workloads. Period: Apr 2026. Last updated 2026-05-09. Permanent URL: https://www.requesty.ai/data/finish-reason-mix-by-provider-april-2026.*

## Why it matters

`finish_reason = tool_calls` is the cleanest signal that a model was driving an agent loop rather than answering a chat prompt. Providers cluster into clear agentic and non-agentic tiers, which has direct implications for routing. Sending agent traffic to a non-agentic provider often produces shorter context windows and worse tool-following without users realising why their agent feels "dumber".

## Questions this answers

- Which LLM provider is best for agentic workloads?
- What share of LLM traffic uses tool calls in 2026?
- Which AI providers are best for AI agents?
- Why does Anthropic dominate agent traffic vs OpenAI?

## Key findings

1. Anthropic-direct: 52% tool_calls, the highest agentic share on the platform.
2. OpenAI Responses (26%), Vertex (Claude) (23%) and Azure (23%) form a clear second tier.
3. Vertex (Claude) at 23% versus Vertex (Gemini) at 13%: same provider routing, different workload by an order of magnitude.
4. OpenAI direct is at 3% tool_calls, 17× lower than Anthropic-direct.
5. Bedrock Claude (7%) versus Anthropic-direct Claude (52%): same model, very different workload mix.
6. NULL finish_reason correlates with successful=false. Moonshot 94% blank is a reliability outlier on that route.

## Data

| Provider | tool_calls (percent) | stop (percent) | length (percent) | blank/error (percent) |
| --- | --- | --- | --- | --- |
| Anthropic | 52.20% | 42.60% | 1.40% | 3.80% |
| OpenAI Responses | 25.90% | 71.00% | 1.00% | 2.10% |
| Vertex (Claude) | 23.40% | 56.20% | 5.30% | 15.10% |
| Azure | 22.60% | 57.50% | 0.40% | 19.50% |
| Vertex (Gemini) | 13.50% | 79.00% | 3.30% | 4.20% |
| Bedrock | 6.70% | 88.50% | 0.50% | 4.30% |
| Moonshot | 4.60% | 1.40% | 0.10% | 93.90% |
| OpenAI | 3.30% | 94.20% | 0.60% | 1.90% |
| xAI | 2.90% | 96.20% | 0.20% | 0.70% |
| DeepSeek | 1.50% | 94.50% | 2.20% | 1.80% |

## Caveats

- Apr 2026 only. finish_reason was not populated for any 2025 row.
- Moonshot 94% blank/error is a reliability problem, not a labeling artefact (success rate 6.2%).

## Cite as

**APA.** Requesty (2026). finish_reason mix per provider, April 2026. Requesty Data. https://www.requesty.ai/data/finish-reason-mix-by-provider-april-2026

```bibtex
@misc{requesty_finish_reason_mix_by_provider_april_2026,
  author       = {{Requesty}},
  title        = {finish\_reason mix per provider, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/finish-reason-mix-by-provider-april-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/finish-reason-mix-by-provider-april-2026/data.json) · [CSV](https://www.requesty.ai/data/finish-reason-mix-by-provider-april-2026/data.csv) · [Markdown](https://www.requesty.ai/data/finish-reason-mix-by-provider-april-2026/data.md)

---

# finish_reason mix per model, April 2026

> Which AI models are used most for tool calling? In April 2026 Claude Opus 4.6 returned `finish_reason = tool_calls` 59% of the time on the Requesty gateway, the most agentic model on the platform. Gemini 2.5 Flash came second at 37%. Same-family Claude Sonnet 4.5 only 9%, and the entire OpenAI lineup (GPT-4o, GPT-4.1-mini, GPT-4.1-nano, GPT-5-mini) sat under 4%.

*Topic: Agentic workloads. Period: Apr 2026. Last updated 2026-05-09. Permanent URL: https://www.requesty.ai/data/finish-reason-mix-by-model-april-2026.*

## Why it matters

Two models from the same provider can have completely different agentic profiles, which means choosing a frontier model for an agent based on brand alone is a coin flip. The headline "Anthropic is agentic" framing on the per-provider chart is really an Opus 4.6 effect: Sonnet 4.5 behaves more like a chat model in production traffic, despite both being marketed as agentic-capable.

## Questions this answers

- Which AI models are used most for tool calling?
- Is Claude Opus more agentic than Claude Sonnet in production?
- Which OpenAI models do AI agents use?
- How agentic is Gemini 2.5 Flash compared to Claude?

## Key findings

1. claude-opus-4-6: 59% tool_calls. The single most agentic model on the platform.
2. gemini-2.5-flash: 37% tool_calls. The mid-tier general-purpose model that is doing real agentic work.
3. claude-sonnet-4-5: 9% tool_calls. The same provider, the same family, dramatically less agentic.
4. OpenAI lineup (gpt-4o, gpt-4.1-mini, gpt-4.1-nano, gpt-5-mini): all under 4% tool_calls.
5. Practical implication: the "agentic provider" framing on the per-provider chart is really an "Opus 4.6 effect". Anthropic-direct looks agentic because Opus is.

## Data

| Model | tool_calls (percent) | stop (percent) | length (percent) |
| --- | --- | --- | --- |
| claude-opus-4-6 | 59.40% | 39.50% | 1.10% |
| gemini-2.5-flash | 36.60% | 61.20% | 2.10% |
| claude-sonnet-4-5 | 9.10% | 90.70% | 0.20% |
| gpt-5-mini | 3.50% | 94.00% | 2.40% |
| gpt-4o | 0.20% | 99.80% | 0.00% |
| gpt-4.1-mini | 0.20% | 99.80% | 0.00% |
| deepseek-chat | 0.50% | 97.20% | 2.30% |
| gpt-4.1-nano | 0.00% | 99.90% | 0.00% |
| gemini-2.5-flash-lite | 0.00% | 99.80% | 0.20% |
| grok-4-1-fast | 0.10% | 99.80% | 0.10% |

## Caveats

- finish_reason was not populated before 2026, so this is April 2026 only.
- Aggregating finish_reason at the model level smooths over how the model is invoked. A model used inside an agent loop will show more tool_calls than the same model used in a one-shot chatbot.

## Cite as

**APA.** Requesty (2026). finish_reason mix per model, April 2026. Requesty Data. https://www.requesty.ai/data/finish-reason-mix-by-model-april-2026

```bibtex
@misc{requesty_finish_reason_mix_by_model_april_2026,
  author       = {{Requesty}},
  title        = {finish\_reason mix per model, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/finish-reason-mix-by-model-april-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/finish-reason-mix-by-model-april-2026/data.json) · [CSV](https://www.requesty.ai/data/finish-reason-mix-by-model-april-2026/data.csv) · [Markdown](https://www.requesty.ai/data/finish-reason-mix-by-model-april-2026/data.md)

---

# Token-weighted tool_calls share per provider, April 2026

> What share of LLM output tokens is spent on tool calls vs chat? In April 2026 on the Requesty gateway, Anthropic emitted 38.8% of its output tokens on `tool_calls` vs 54.2% of requests, so agentic completions are roughly 30% smaller than chat ones. OpenAI Responses showed the opposite: 34.2% of tokens vs 26.4% of requests. Vertex (Claude) had the biggest negative gap (6.1% of tokens vs 27.6% of requests).

*Topic: Agentic workloads. Period: Apr 2026. Last updated 2026-05-09. Permanent URL: https://www.requesty.ai/data/tool-call-token-share-april-2026.*

## Why it matters

Counting requests overweights short tool-call payloads; counting tokens overweights long chat replies. Two providers with the same request-level agentic share can have wildly different agentic token shares, which matters for capacity planning, billing reconciliation, and any benchmark that aggregates over tokens rather than calls. Pick the wrong axis and the same provider can look 5× more or less agentic than it actually is.

## Questions this answers

- What share of AI output tokens is spent on tool calls?
- Are tool-call payloads bigger or smaller than chat replies?
- Why do request-counts and token-counts disagree on agentic share?
- Which providers have the most token-heavy tool calls?

## Key findings

1. Anthropic: 38.8% of output tokens vs 54.2% of requests. Agentic completions are ~30% smaller than chat ones. tool_calls payloads are compact.
2. OpenAI Responses: 34.2% of output tokens vs 26.4% of requests. The opposite shape. agentic completions emit more tokens than chat ones.
3. Vertex (Claude): 6.1% of tokens vs 27.6% of requests. The biggest negative gap on the chart. Claude on Vertex is dominated by lots of small tool-call payloads, while chat completions on the same route are heavy.
4. Vertex (Gemini): 1.5% of tokens vs 14.1% of requests. Same shape as Vertex (Claude) but more extreme. Gemini chat replies are huge, so agentic completions barely register on the token-weighted view.
5. xAI: 17.2% of tokens vs 2.9% of requests. Few agentic calls, but each one is verbose.
6. OpenAI direct: 2.7% of tokens vs 3.4% of requests. The two views agree. there is barely any agentic load on this route in either framing.

## Data

| Provider | Tool-call output-token share (percent) | Tool-call request share (percent) | Gap (token - request) (percent) |
| --- | --- | --- | --- |
| Moonshot | 54.70% | 75.00% | -20.30% |
| Minimaxi | 52.50% | 50.80% | 1.70% |
| Anthropic | 38.80% | 54.20% | -15.40% |
| OpenAI Responses | 34.20% | 26.40% | 7.80% |
| Azure | 18.00% | 27.90% | -9.90% |
| xAI | 17.20% | 2.90% | 14.30% |
| Bedrock | 14.40% | 7.00% | 7.40% |
| Alibaba | 12.20% | 1.70% | 10.50% |
| Vertex (Claude) | 6.10% | 27.60% | -21.50% |
| Novita | 3.00% | 1.90% | 1.10% |
| OpenAI | 2.70% | 3.40% | -0.70% |
| Vertex (Gemini) | 1.50% | 14.10% | -12.60% |
| DeepSeek | 1.20% | 1.50% | -0.30% |
| Mistral | 1.00% | 1.90% | -0.90% |
| Nebius | 0.90% | 3.50% | -2.60% |
| Groq | 0.80% | 1.00% | -0.20% |
| DeepInfra | 0.30% | 0.10% | 0.20% |

## Cite as

**APA.** Requesty (2026). Token-weighted tool_calls share per provider, April 2026. Requesty Data. https://www.requesty.ai/data/tool-call-token-share-april-2026

```bibtex
@misc{requesty_tool_call_token_share_april_2026,
  author       = {{Requesty}},
  title        = {Token-weighted tool\_calls share per provider, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/tool-call-token-share-april-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/tool-call-token-share-april-2026/data.json) · [CSV](https://www.requesty.ai/data/tool-call-token-share-april-2026/data.csv) · [Markdown](https://www.requesty.ai/data/tool-call-token-share-april-2026/data.md)

---

# Family share within OSS-routed traffic, Nov 2025 - Apr 2026

> Which open-weight AI model is most popular in 2026? On the Requesty gateway, OSS-routed traffic went from Qwen-dominated in late 2025 (34-38% share in Nov-Dec) to DeepSeek-dominated in January 2026 (77% after the R1 launch), and back to a genuinely diversified state by April (DeepSeek 47%, Kimi 17%, MiniMax 15%). Qwen collapsed from 38% to under 4% almost overnight when DeepSeek R1 shipped.

*Topic: Agentic workloads. Period: Nov 2025 - Apr 2026. Last updated 2026-05-10. Permanent URL: https://www.requesty.ai/data/oss-family-share-jan-apr-2026.*

## Why it matters

Open-source LLM leadership rotates on a months-not-years timescale: the "best open model" changes with each new release, and the long tail diversifies fast once any single model loses its lead. For teams hard-coding OSS choices into prompts or routing rules, that means yesterday's default is often already wrong. Kimi K2 quintupling in three months is the clearest current example.

## Questions this answers

- Which open-source LLM is most popular in 2026?
- Has DeepSeek overtaken Qwen for open-weight traffic?
- How fast does open-source AI model leadership change?
- Is Kimi K2 gaining real production traction?

## Key findings

1. Qwen (Alibaba): 34% in Nov, 38% in Dec, then collapsed to under 4% from January onward. DeepSeek R1 launch killed Qwen share overnight.
2. DeepSeek: 10% in Nov, exploded to 77% in Jan (R1 launch), declining since to 47% in Apr.
3. Kimi (Moonshot): volatile. 10% Nov, 16% Dec, collapsed to 2% Jan, back to 17% Apr.
4. MiniMax: 14% Nov, near-zero Dec, recovered to 15% by Apr.
5. The OSS tier went from concentrated (one family >33%) to diversified (no family >47%) in six months.

## Data

| Month | DeepSeek (percent) | MiniMax (percent) | Kimi (Moonshot) (percent) | Mistral (percent) | GLM (Zhipu) (percent) | Qwen (Alibaba) (percent) | Llama (Meta) (percent) | GPT-OSS (OpenAI) (percent) |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| November | 10.29% | 13.59% | 9.64% | 8.66% | 14.01% | 33.64% | 2.51% | 7.64% |
| December | 17.43% | 0.48% | 15.82% | 13.59% | 7.49% | 37.55% | 2.11% | 5.44% |
| January | 76.70% | 6.39% | 1.73% | 6.47% | 3.17% | 3.85% | 0.68% | 1.02% |
| February | 63.75% | 5.39% | 7.03% | 11.89% | 6.39% | 3.38% | 1.00% | 1.19% |
| March | 65.17% | 13.96% | 4.39% | 4.99% | 7.52% | 1.61% | 1.57% | 0.80% |
| April | 46.68% | 14.58% | 16.75% | 9.65% | 6.71% | 1.72% | 3.01% | 0.90% |

## Caveats

- OSS is defined as traffic routed through open-source aggregator providers (not frontier APIs). The boundary is imperfect.

## Cite as

**APA.** Requesty (2026). Family share within OSS-routed traffic, Nov 2025 - Apr 2026. Requesty Data. https://www.requesty.ai/data/oss-family-share-jan-apr-2026

```bibtex
@misc{requesty_oss_family_share_jan_apr_2026,
  author       = {{Requesty}},
  title        = {Family share within OSS-routed traffic, Nov 2025 - Apr 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/oss-family-share-jan-apr-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/oss-family-share-jan-apr-2026/data.json) · [CSV](https://www.requesty.ai/data/oss-family-share-jan-apr-2026/data.csv) · [Markdown](https://www.requesty.ai/data/oss-family-share-jan-apr-2026/data.md)

---

# Reasoning-token share of provider output, April 2026

> How much of LLM output is reasoning/thinking tokens? In April 2026 on the Requesty gateway, Groq led at 82%, followed by Coding (79%), xAI (60%) and z.ai (51%). These routes are dominated by thinking models. Frontier routes ran around a third: Vertex (Gemini) 40%, OpenAI 36%, OpenAI Responses 33%. Anthropic and Bedrock report 0% because Anthropic does not surface reasoning tokens separately; extended thinking is delivered inline.

*Topic: Agentic workloads. Period: Apr 2026. Last updated 2026-05-09. Permanent URL: https://www.requesty.ai/data/reasoning-token-share-by-provider-april-2026.*

## Why it matters

The industry narrative is "everything is reasoning now", but the data says reasoning is concentrated in a specific subset of routes, and even there, absolute volume is dwarfed by regular completion output. The Anthropic and Bedrock 0% is a measurement artefact, not a usage signal, which matters for any cost or quality comparison that relies on the reasoning-tokens column.

## Questions this answers

- How much LLM output is reasoning tokens?
- Which providers use the most reasoning models in 2026?
- Why does Anthropic show 0% reasoning tokens?
- Are AI agents mostly thinking or mostly responding?

## Key findings

1. High-reasoning routes: Groq 82%, Coding 79%, xAI 60%, z.ai 51%.
2. Frontier routes around a third: Vertex (Gemini) 40%, OpenAI 36%, OpenAI Responses 33%.
3. Vertex (Claude) does not appear here: Anthropic does not report reasoning tokens separately, so Claude thinking output is not counted.
4. Azure at 18%, leans on GPT-4.1-class models more than the latest reasoning checkpoints.
5. Anthropic, Bedrock, Mistral, Moonshot: 0%. Anthropic does not report reasoning tokens separately (thinking is inline). Mistral and Moonshot have no reasoning models routed.
6. Industry narrative is "everything is reasoning now". The data says reasoning is concentrated in a specific subset of providers and even there the absolute volume is dwarfed by regular completion output.

## Data

| Provider | Reasoning share (percent) |
| --- | --- |
| Groq | 82.30% |
| Coding | 79.00% |
| xAI | 59.70% |
| z.ai | 51.30% |
| Vertex (Gemini) | 39.90% |
| Minimaxi | 37.20% |
| OpenAI | 35.90% |
| OpenAI Responses | 32.50% |
| Azure | 18.10% |
| Novita | 3.00% |
| DeepSeek | 2.70% |

## Caveats

- Reasoning tokens were not tracked before 2026, so this is April 2026 only. Year-over-year comparison is not possible.
- A 0% reading does not necessarily mean a provider has no reasoning models - only that reasoning output is not reported separately on that route (e.g. Anthropic delivers thinking inline).

## Cite as

**APA.** Requesty (2026). Reasoning-token share of provider output, April 2026. Requesty Data. https://www.requesty.ai/data/reasoning-token-share-by-provider-april-2026

```bibtex
@misc{requesty_reasoning_token_share_by_provider_april_2026,
  author       = {{Requesty}},
  title        = {Reasoning-token share of provider output, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/reasoning-token-share-by-provider-april-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/reasoning-token-share-by-provider-april-2026/data.json) · [CSV](https://www.requesty.ai/data/reasoning-token-share-by-provider-april-2026/data.csv) · [Markdown](https://www.requesty.ai/data/reasoning-token-share-by-provider-april-2026/data.md)

---

# Average cost per user per month by coding agent, May 2025 to April 2026

> How much does a typical coding agent user spend per month? Across nine agents observed over twelve months through the Requesty gateway, the weighted average rose from $14/month to $54/month ($91 for active users with 2+ active days). Claude Code active users average $108/month (median $23, P95 $296) in April 2026. Roo Code active users spend $79/month, OpenCode $104/month, and Cline $49/month.

*Topic: Agentic workloads. Period: May 2025 to Apr 2026. Last updated 2026-05-16. Permanent URL: https://www.requesty.ai/data/coding-agent-cost-per-user-may-2025-apr-2026.*

## Why it matters

Per-user cost is the core unit economic for coding agent businesses. The 3.8x increase is driven by longer agentic sessions and reasoning models, not by more expensive individual calls. Caching improvements kept per-call costs flat even as context windows grew 68%. The cost increase is entirely a function of usage intensity.

## Questions this answers

- How much does it cost to use Claude Code per month?
- Which coding agent has the highest average cost per user?
- How has coding agent cost per user changed over the last year?
- What is the average monthly spend for Roo Code users?
- How do coding agent costs compare across tools?

## Key findings

1. Claude Code active users: $108/month mean, $23 median, $296 P95 in April 2026.
2. Active-user weighted average across all agents: $91/month (vs. $54 including trial users).
3. OpenCode: fastest-growing cost trajectory, active users at $104/month (from $6 in six months).
4. Roo Code active users: $79/month, the most consistent linear growth over twelve months.
5. Large median-to-mean gaps across all agents confirm power-user driven spend distribution.

## Data

| Month | Agent | Avg cost/user |
| --- | --- | --- |
| 2025-08 | Claude Code | 8 |
| 2025-09 | Claude Code | 15 |
| 2025-10 | Claude Code | 17 |
| 2025-11 | Claude Code | 27 |
| 2025-12 | Claude Code | 44 |
| 2026-01 | Claude Code | 33 |
| 2026-02 | Claude Code | 50 |
| 2026-03 | Claude Code | 55 |
| 2026-04 | Claude Code | 78 |
| 2025-05 | Roo Code | 17 |
| 2025-06 | Roo Code | 14 |
| 2025-07 | Roo Code | 14 |
| 2025-08 | Roo Code | 14 |
| 2025-09 | Roo Code | 17 |
| 2025-10 | Roo Code | 22 |
| 2025-11 | Roo Code | 23 |
| 2025-12 | Roo Code | 28 |
| 2026-01 | Roo Code | 35 |
| 2026-02 | Roo Code | 35 |
| 2026-03 | Roo Code | 42 |
| 2026-04 | Roo Code | 46 |
| 2025-11 | OpenCode | 3 |
| 2025-12 | OpenCode | 13 |
| 2026-02 | OpenCode | 19 |
| 2026-03 | OpenCode | 35 |
| 2026-04 | OpenCode | 58 |
| 2025-05 | Cline | 5 |
| 2025-06 | Cline | 5 |
| 2025-07 | Cline | 7 |
| 2025-08 | Cline | 8 |
| 2025-09 | Cline | 10 |
| 2025-10 | Cline | 12 |
| 2025-11 | Cline | 16 |
| 2025-12 | Cline | 18 |
| 2026-01 | Cline | 19 |
| 2026-02 | Cline | 20 |
| 2026-03 | Cline | 24 |
| 2026-04 | Cline | 26 |
| 2025-05 | Kilo Code | 11 |
| 2025-06 | Kilo Code | 9 |
| 2025-07 | Kilo Code | 13 |
| 2025-08 | Kilo Code | 11 |
| 2025-09 | Kilo Code | 19 |
| 2025-10 | Kilo Code | 16 |
| 2025-11 | Kilo Code | 28 |
| 2025-12 | Kilo Code | 38 |
| 2026-01 | Kilo Code | 32 |
| 2026-02 | Kilo Code | 48 |
| 2026-03 | Kilo Code | 51 |
| 2026-04 | Kilo Code | 65 |

## Caveats

- Even among active users, spend is power-user driven: Claude Code median is $23 vs. $108 mean.
- Our sample reflects users routing through Requesty and is subject to selection bias.
- Cursor and GitHub Copilot have minimal representation because they primarily connect directly to providers.

## Cite as

**APA.** Requesty (2026). Average cost per user per month by coding agent, May 2025 to April 2026. Requesty Data. https://www.requesty.ai/data/coding-agent-cost-per-user-may-2025-apr-2026

```bibtex
@misc{requesty_coding_agent_cost_per_user_may_2025_apr_2026,
  author       = {{Requesty}},
  title        = {Average cost per user per month by coding agent, May 2025 to April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/coding-agent-cost-per-user-may-2025-apr-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/coding-agent-cost-per-user-may-2025-apr-2026/data.json) · [CSV](https://www.requesty.ai/data/coding-agent-cost-per-user-may-2025-apr-2026/data.csv) · [Markdown](https://www.requesty.ai/data/coding-agent-cost-per-user-may-2025-apr-2026/data.md)

---

# Prompt-cache hit rate by coding agent, April 2026

> Which coding agents use prompt caching most effectively? In April 2026, Claude Code led at 92% cache hit rate (cached_tokens / input_tokens), followed by OpenCode at 89%. Kilo Code sits at 46% with 62K avg input tokens. The gap is architectural: agents that maintain consistent context prefixes across sequential calls achieve dramatically higher cache reuse.

*Topic: Agentic workloads. Period: Apr 2026. Last updated 2026-05-16. Permanent URL: https://www.requesty.ai/data/coding-agent-cache-hit-rate-apr-2026.*

## Why it matters

Cache efficiency is the single biggest lever on coding agent economics. At 92% cache hit, Claude Code pays roughly $0.30 per million effective input tokens versus $3.00 list price. At 46%, Kilo Code pays $1.62 per million. That 5.4x cost difference compounds across every call in every session, enabling high-cache agents to sustain intensive workflows at fraction of the cost.

## Questions this answers

- Which coding agent has the best prompt caching efficiency?
- How much does prompt caching reduce coding agent costs?
- How does Claude Code achieve 92% cache hit rate?

## Key findings

1. Claude Code: 92% cache hit rate, the leader by a wide margin.
2. OpenCode: 89%. Second only to Claude Code despite different architecture.
3. Roo Code: 74%. Solid but significantly behind Claude Code.
4. Kilo Code: 46%. Smaller context windows (62K vs 84K) reduce prefix reuse opportunity.
5. Higher cache rates correlate strongly with lower per-call costs across all agents.

## Data

| Agent | Cache hit rate (percent) |
| --- | --- |
| Claude Code | 91.90% |
| OpenCode | 89.00% |
| Aider | 84.00% |
| Zed | 80.10% |
| Roo Code | 73.60% |
| Forge | 63.90% |
| Cline | 61.40% |
| Kilo Code | 45.50% |

## Caveats

- Cache hit rate depends on both agent architecture and model provider. Anthropic, Bedrock, and Vertex have different caching implementations.
- Agents with very low traffic (Cursor, GitHub Copilot, Codex CLI) are excluded due to insufficient sample size.

## Cite as

**APA.** Requesty (2026). Prompt-cache hit rate by coding agent, April 2026. Requesty Data. https://www.requesty.ai/data/coding-agent-cache-hit-rate-apr-2026

```bibtex
@misc{requesty_coding_agent_cache_hit_rate_apr_2026,
  author       = {{Requesty}},
  title        = {Prompt-cache hit rate by coding agent, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/coding-agent-cache-hit-rate-apr-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/coding-agent-cache-hit-rate-apr-2026/data.json) · [CSV](https://www.requesty.ai/data/coding-agent-cache-hit-rate-apr-2026/data.csv) · [Markdown](https://www.requesty.ai/data/coding-agent-cache-hit-rate-apr-2026/data.md)

---

# Claude model share by coding agent, April 2026

> How much of coding agent spend goes to Claude? In April 2026, Claude models power 79% to 100% of spend across all nine coding agents observed through the Requesty gateway. Claude Code is nearly 100% locked to Claude (expected, as Anthropic's own product). Zed is the most model-diverse at 59% Claude / 41% OpenAI. OpenCode has the highest non-Claude adoption among open-source agents at 13% OpenAI.

*Topic: Agentic workloads. Period: Apr 2026. Last updated 2026-05-16. Permanent URL: https://www.requesty.ai/data/coding-agent-model-share-apr-2026.*

## Why it matters

The near-total consolidation to Claude occurred organically across agents whose users freely choose their backend model. This is a revealed preference: when developers can pick any model for coding tasks, they overwhelmingly choose Claude. Gemini's share has declined from 34% (May 2025 via Cline) to under 6% across all agents, despite competitive pricing.

## Questions this answers

- Which AI model do coding agents use most?
- What percentage of coding agent spend goes to Claude?
- Do coding agent users prefer Claude over GPT for coding?
- Which coding agent has the most model diversity?
- Why is Claude so dominant in AI coding tools?

## Key findings

1. Claude Code: 99.5% Claude. Expected, as it is Anthropic's first-party tool.
2. Roo Code: 92% Claude. Open-source, model-agnostic, yet users overwhelmingly choose Claude.
3. OpenCode: 79% Claude, 13% OpenAI. The most experimental open-source user base.
4. Zed: 59% Claude, 41% OpenAI. The most model-diverse agent overall.
5. Gemini share is below 6% across all agents despite competitive pricing.

## Data

| Agent | Claude share (percent) |
| --- | --- |
| Claude Code | 99.50% |
| Forge | 99.50% |
| Kilo Code | 93.70% |
| Roo Code | 91.70% |
| Cline | 89.00% |
| OpenCode | 79.00% |
| Aider | 78.20% |
| Zed | 59.00% |

## Caveats

- Share is measured by spend, not request count. High-cost models appear larger in spend share.
- Cursor and GitHub Copilot have minimal representation because they primarily connect directly to providers.
- Model share reflects user choice, not agent defaults. Most agents allow users to configure any model.

## Cite as

**APA.** Requesty (2026). Claude model share by coding agent, April 2026. Requesty Data. https://www.requesty.ai/data/coding-agent-model-share-apr-2026

```bibtex
@misc{requesty_coding_agent_model_share_apr_2026,
  author       = {{Requesty}},
  title        = {Claude model share by coding agent, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/coding-agent-model-share-apr-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/coding-agent-model-share-apr-2026/data.json) · [CSV](https://www.requesty.ai/data/coding-agent-model-share-apr-2026/data.csv) · [Markdown](https://www.requesty.ai/data/coding-agent-model-share-apr-2026/data.md)

---

# Tool-call finish rate by coding agent, April 2026

> How do coding agent API calls end? In April 2026, Roo Code leads with 91% of calls finishing via tool_calls, the primary agentic pattern. Claude Code follows at 73%. Cline (81% stop) and Aider (87% stop) favor single-turn completions. Kilo Code shows 63% tool_calls and 28% stop, a balanced mix of agentic and single-turn patterns.

*Topic: Agentic workloads. Period: Apr 2026. Last updated 2026-05-16. Permanent URL: https://www.requesty.ai/data/coding-agent-finish-reason-apr-2026.*

## Why it matters

Finish reasons reveal fundamental architectural differences between coding agents. A high tool_calls rate indicates an agentic loop pattern where the model invokes tools (read files, write code, run tests) as part of a multi-step workflow. A high stop rate indicates single-turn generation. The industry-wide shift from stop-dominant to tool-call-dominant patterns over twelve months reflects the broader move toward autonomous coding workflows.

## Questions this answers

- Which coding agents use tool calls the most?
- How do coding agent API calls typically finish?
- What percentage of Claude Code calls end with tool calls?
- Are coding agents becoming more agentic over time?

## Key findings

1. Roo Code: 91% tool_calls. Transitioned from 100% stop-based (early 2025) to almost entirely tool-call-based.
2. Claude Code: 73% tool_calls, 20% stop. Heavy agentic loop with some single-turn reasoning calls.
3. Cline: 81% stop. Primarily single-turn completions rather than multi-step tool use.
4. Kilo Code: 63% tool_calls, 28% stop. A balanced approach between agentic and single-turn patterns.
5. The industry has shifted from stop-dominant to tool-call-dominant patterns over twelve months.

## Data

| Agent | tool_calls rate (percent) |
| --- | --- |
| Roo Code | 91.10% |
| OpenCode | 87.30% |
| Forge | 79.30% |
| Claude Code | 73.20% |
| Zed | 66.70% |
| Kilo Code | 62.50% |
| Cline | 16.00% |
| Aider | 12.40% |

## Caveats

- Finish reasons are reported by the upstream model provider. Interpretation varies slightly across providers.
- "empty/failed" includes both true failures and calls where the model returned an empty response.
- Agents with very low traffic (Cursor, GitHub Copilot, Codex CLI) are excluded.

## Cite as

**APA.** Requesty (2026). Tool-call finish rate by coding agent, April 2026. Requesty Data. https://www.requesty.ai/data/coding-agent-finish-reason-apr-2026

```bibtex
@misc{requesty_coding_agent_finish_reason_apr_2026,
  author       = {{Requesty}},
  title        = {Tool-call finish rate by coding agent, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/coding-agent-finish-reason-apr-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/coding-agent-finish-reason-apr-2026/data.json) · [CSV](https://www.requesty.ai/data/coding-agent-finish-reason-apr-2026/data.csv) · [Markdown](https://www.requesty.ai/data/coding-agent-finish-reason-apr-2026/data.md)

---

# Session depth by coding agent, April 2026

> How many API calls does a single coding session make? In April 2026, Claude Code has the deepest sessions at 16 median calls per trace and reaches 209 calls at P95, reflecting complex multi-step coding workflows. Roo Code sessions are shallower at 11 median calls but more numerous (6,247 traces vs 594 for Claude Code).

*Topic: Agentic workloads. Period: Apr 2026. Last updated 2026-05-16. Permanent URL: https://www.requesty.ai/data/coding-agent-session-depth-apr-2026.*

## Why it matters

Session depth is a proxy for agentic autonomy. Deeper sessions mean the agent is performing more steps before returning control to the developer. Claude Code P95 of 209 calls per session means the top 5% of coding tasks involve 200+ round trips to the model. This drives both cost and latency accumulation.

## Questions this answers

- How many API calls does a typical Claude Code session make?
- Which coding agent has the deepest coding sessions?
- What is the P95 session depth for coding agents?
- Do deeper sessions correlate with higher costs?

## Key findings

1. Claude Code: 16 median calls, but P95 of 209 calls. The longest tail of any agent.
2. Roo Code: 11 median calls, consistent across months. The most predictable session depth.
3. Only 3 agents send trace_id consistently. Cline, OpenCode, and Zed do not populate traces.

## Data

| Agent | Median calls | P95 calls | Max calls | Total traces |
| --- | --- | --- | --- | --- |
| Claude Code | 16 | 209 | 3.2k | 594 |
| Roo Code | 11 | 74 | 1.8k | 6.2k |

## Caveats

- Only agents with trace_id coverage are included (Claude Code, Roo Code).
- Claude Code trace coverage dropped to ~7% in April, so these stats reflect a subset of sessions.

## Cite as

**APA.** Requesty (2026). Session depth by coding agent, April 2026. Requesty Data. https://www.requesty.ai/data/coding-agent-session-depth-apr-2026

```bibtex
@misc{requesty_coding_agent_session_depth_apr_2026,
  author       = {{Requesty}},
  title        = {Session depth by coding agent, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/coding-agent-session-depth-apr-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/coding-agent-session-depth-apr-2026/data.json) · [CSV](https://www.requesty.ai/data/coding-agent-session-depth-apr-2026/data.csv) · [Markdown](https://www.requesty.ai/data/coding-agent-session-depth-apr-2026/data.md)

---

# Streaming adoption by coding agent, April 2026

> Do coding agents stream their API responses? In April 2026, most agents stream nearly 100% of calls. Aider is the major outlier at 22% streaming, preferring batch completions. Claude Code streams 93% of calls. Aider also has the highest reasoning token intensity at 82%, suggesting it relies on reasoning models in non-streaming mode.

*Topic: Agentic workloads. Period: Apr 2026. Last updated 2026-05-16. Permanent URL: https://www.requesty.ai/data/coding-agent-streaming-adoption-apr-2026.*

## Why it matters

Streaming affects both user experience and infrastructure cost. Streaming responses allow coding agents to show partial output in real time, improving perceived latency. Aider takes a different approach: it sends batch requests to reasoning models, waits for the full response, then applies code changes. This architectural choice explains its lower streaming rate and higher reasoning intensity.

## Questions this answers

- Which coding agents use streaming responses?
- Why does Aider not use streaming for most calls?
- How does streaming adoption correlate with reasoning model usage?
- What percentage of Claude Code calls use streaming?

## Key findings

1. Cline, Forge, Zed, and OpenCode: 100% streaming. No batch completions at all.
2. Claude Code: 93% streaming. The 7% non-streaming calls may be health checks or metadata requests.
3. Aider: 22% streaming, 82% reasoning intensity. The only agent that primarily uses batch mode with reasoning models.
4. Zed: 100% streaming with 40% reasoning intensity. Highest reasoning use among fully-streaming agents.
5. Forge: 100% streaming but only 0.6% reasoning intensity. Minimal use of reasoning models.

## Data

| Agent | Streaming (percent) | Reasoning intensity (percent) | Cache hit rate (percent) |
| --- | --- | --- | --- |
| Cline | 100.00% | 12.07% | 61.36% |
| Forge | 100.00% | 0.62% | 63.93% |
| Zed | 100.00% | 39.83% | 80.05% |
| OpenCode | 100.00% | 21.00% | 88.98% |
| Kilo Code | 99.92% | 13.87% | 45.49% |
| Roo Code | 99.87% | 8.07% | 73.63% |
| Claude Code | 93.47% | 2.17% | 91.91% |
| Aider | 22.23% | 81.55% | 84.02% |

## Caveats

- Streaming percentage reflects request count, not token volume. A single non-streaming request may generate more tokens than multiple streaming ones.
- Reasoning intensity is computed as reasoning_tokens / output_tokens. Claude models report 0 reasoning_tokens in our pipeline even when extended thinking is active.
- GitHub Copilot and Codex CLI are excluded due to minimal traffic volume.

## Cite as

**APA.** Requesty (2026). Streaming adoption by coding agent, April 2026. Requesty Data. https://www.requesty.ai/data/coding-agent-streaming-adoption-apr-2026

```bibtex
@misc{requesty_coding_agent_streaming_adoption_apr_2026,
  author       = {{Requesty}},
  title        = {Streaming adoption by coding agent, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/coding-agent-streaming-adoption-apr-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/coding-agent-streaming-adoption-apr-2026/data.json) · [CSV](https://www.requesty.ai/data/coding-agent-streaming-adoption-apr-2026/data.csv) · [Markdown](https://www.requesty.ai/data/coding-agent-streaming-adoption-apr-2026/data.md)

---

## Topic: Latency and performance

---

# Latency leaderboard per provider, April 2026

> Which AI provider has the lowest latency in April 2026? On the Requesty gateway xAI led p50 at 0.6 s, with Novita (0.8 s), Azure (1.0 s) and Mistral (1.4 s) close behind. Vertex (Claude) was the slowest at 13.7 s, 23× the fastest and 2.8× slower than Vertex (Gemini) at 4.9 s on the same Vertex route. Anthropic-direct sat mid-pack at 5.8 s with a 52.6 s p95 long tail.

*Topic: Latency and performance. Period: Apr 2026. Last updated 2026-05-09. Permanent URL: https://www.requesty.ai/data/provider-latency-leaderboard-april-2026.*

## Why it matters

Total p50 latency is dominated by workload type, not pure provider speed. The 23× spread is partly silicon, partly streaming behaviour, but mostly the size and tool-call complexity of requests being sent. The Vertex-Claude tail is heavy agentic Claude Code traffic, not slow inference. Reading the leaderboard literally without that context will mislead any provider-selection decision.

## Questions this answers

- Which LLM provider has the lowest latency in 2026?
- What is the fastest LLM provider for chat completions?
- Why is Vertex Claude so slow compared to Anthropic direct?
- What is the p95 latency of OpenAI vs Anthropic?

## Key findings

1. p50 spans 23× from fastest to slowest: xAI 0.6 s to Vertex (Claude) 13.7 s.
2. Fast tier: xAI (0.6 s), Novita (0.8 s), Azure (1.0 s), Mistral (1.4 s).
3. Vertex split is striking: Vertex (Gemini) 4.9 s, Vertex (Claude) 13.7 s. Same provider routing, very different workload weight.
4. Frontier-Claude tier: Anthropic 5.8 s, with long-tail variance Anthropic p95 52.6 s, DeepSeek p95 74.0 s.
5. TTFT is decoupled. Azure is fastest to first token (0.6 s) despite a 1.0 s total p50.
6. xAI: fast on total but slow to first token (3.27 s TTFT). Suggests buffered or non-streaming upstream behaviour.

## Data

| Provider | p50 latency (milliseconds) | p95 latency (milliseconds) | p50 TTFT (milliseconds) |
| --- | --- | --- | --- |
| xAI | 600 ms | 10.9 s | 3.27 s |
| Novita | 800 ms | 18.5 s | 3.10 s |
| Azure | 1.00 s | 8.80 s | 600 ms |
| Mistral | 1.40 s | 9.80 s | 1.01 s |
| OpenAI | 2.50 s | 17.9 s | 1.84 s |
| Bedrock | 2.80 s | 23.8 s | 1.86 s |
| Vertex (Gemini) | 4.90 s | 27.2 s | 1.28 s |
| Anthropic | 5.80 s | 52.6 s | 2.14 s |
| Moonshot | 5.90 s | 64.1 s | 2.62 s |
| DeepSeek | 9.00 s | 74.0 s | 1.17 s |
| Vertex (Claude) | 13.7 s | 115.2 s | 1.44 s |

## Caveats

- TTFT (first_token_latency_ns) was not populated before 2026, so any TTFT YoY is impossible.
- p95 is highly sensitive to the tail of long completions; treat it as an upper bound for "what the worst 5% of users feel" rather than a steady-state operating point.

## Cite as

**APA.** Requesty (2026). Latency leaderboard per provider, April 2026. Requesty Data. https://www.requesty.ai/data/provider-latency-leaderboard-april-2026

```bibtex
@misc{requesty_provider_latency_leaderboard_april_2026,
  author       = {{Requesty}},
  title        = {Latency leaderboard per provider, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/provider-latency-leaderboard-april-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/provider-latency-leaderboard-april-2026/data.json) · [CSV](https://www.requesty.ai/data/provider-latency-leaderboard-april-2026/data.csv) · [Markdown](https://www.requesty.ai/data/provider-latency-leaderboard-april-2026/data.md)

---

# Provider throughput density, April 2026

> How many tokens per second can each LLM provider sustain? In April 2026 on the Requesty gateway Groq led at 320 output tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon. Vertex (Gemini) was second at 130 tok/sec, Mistral 120 tok/sec; OSS aggregator routes (Nebius, Minimaxi, DeepInfra) clustered at 23-26 tok/sec; Bedrock was slowest at 15 tok/sec, 21× behind Groq.

*Topic: Latency and performance. Period: Apr 2026. Last updated 2026-05-10. Permanent URL: https://www.requesty.ai/data/provider-throughput-density-april-2026.*

## Why it matters

Throughput density (output tokens per second of total wall-clock latency) is the right number to optimise streaming UX, not raw p50 latency. Two providers with identical p50 totals can deliver wildly different perceived speed depending on token rate. Vertex (Claude) is actually faster per-token than Anthropic-direct, despite higher total latency, because Vertex Claude requests emit roughly 3× more output tokens on average.

## Questions this answers

- What is the fastest LLM provider in tokens per second?
- How fast does Groq stream compared to Anthropic?
- Which LLM has the best streaming throughput?
- Is Vertex Claude faster than Anthropic direct in practice?

## Key findings

1. Groq leads at 320 tok/sec, 2.5× the next-fastest provider, attributable to its custom inference silicon.
2. Vertex (Gemini) is second at 130 tok/sec, followed by Mistral at 120 tok/sec.
3. Vertex (Claude) at 56 tok/sec is faster per-token than Anthropic-direct at 46 tok/sec, even though Vertex (Claude)'s total request latency is 2.4× higher (Vertex (Claude) requests emit ~3× more output tokens on average).
4. OSS-aggregator routes (Nebius, Minimaxi, DeepInfra) cluster in the 23-26 tok/sec band.
5. Bedrock is the slowest at 15 tok/sec, 21× behind Groq.

## Data

| Provider | p50 tokens / sec | p50 ms / token (milliseconds) |
| --- | --- | --- |
| Groq | 320 | 3 ms |
| Vertex (Gemini) | 130 | 8 ms |
| Mistral | 120 | 8 ms |
| xAI | 65 | 16 ms |
| OpenAI | 57 | 18 ms |
| Novita | 56 | 18 ms |
| Vertex (Claude) | 56 | 18 ms |
| Anthropic | 46 | 22 ms |
| OpenAI Responses | 44 | 23 ms |
| Azure | 39 | 26 ms |
| DeepSeek | 31 | 32 ms |
| Alibaba | 28 | 36 ms |
| Moonshot | 27 | 37 ms |
| Nebius | 26 | 39 ms |
| Minimaxi | 24 | 41 ms |
| DeepInfra | 24 | 42 ms |
| Bedrock | 15 | 66 ms |

## Caveats

- p50 of a per-request rate, not a global rate. Two providers with the same throughput density can have very different total latencies if their typical output sizes differ (Vertex Claude vs Anthropic-direct is the clearest example).
- Computed on successful completions with output_tokens > 0 and total_latency_ns > 0.
- Apr 2026 only; this is a snapshot, not a trend.

## Cite as

**APA.** Requesty (2026). Provider throughput density, April 2026. Requesty Data. https://www.requesty.ai/data/provider-throughput-density-april-2026

```bibtex
@misc{requesty_provider_throughput_density_april_2026,
  author       = {{Requesty}},
  title        = {Provider throughput density, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/provider-throughput-density-april-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/provider-throughput-density-april-2026/data.json) · [CSV](https://www.requesty.ai/data/provider-throughput-density-april-2026/data.csv) · [Markdown](https://www.requesty.ai/data/provider-throughput-density-april-2026/data.md)

---

# Streaming TTFT vs total latency, April 2026

> Which AI provider has the fastest time-to-first-token? In April 2026 on streaming-and-successful Requesty requests, Azure led TTFT at 593 ms with a 960 ms p50 total, the streaming-UX winner on both axes. xAI was among the fastest on total latency (5.68 s) but slowest to first token (3.27 s), which suggests buffered upstream behaviour rather than true streaming. Vertex (Gemini) and Vertex (Claude) sit at very different points: Gemini totals 3.05 s, Claude totals 8.03 s on the same Vertex route.

*Topic: Latency and performance. Period: Apr 2026. Last updated 2026-05-09. Permanent URL: https://www.requesty.ai/data/streaming-ttft-vs-total-april-2026.*

## Why it matters

Time-to-first-token is what users actually feel as latency in chat UIs. A 600 ms TTFT feels instantaneous; a 3 s TTFT feels broken even if total latency is the same. Buffered streaming masquerading as real streaming is a common antipattern in this dataset, and any latency benchmark that only quotes total p50 will miss it entirely.

## Questions this answers

- What is the fastest streaming LLM provider?
- Which LLM has the lowest time to first token in 2026?
- Does xAI actually stream or is it buffered?
- How does streaming affect perceived AI latency?

## Key findings

1. Azure: 593 ms p50 TTFT, 960 ms p50 total. The streaming-UX winner on both axes.
2. Nebius (659 ms TTFT) and OpenAI Responses (731 ms) are also strong on first-token speed.
3. Vertex (Gemini) 1.29 s TTFT vs Vertex (Claude) 1.44 s TTFT. Gemini totals 3.05 s, Claude totals 8.03 s. The Claude variant carries the heavy agentic completions on this route.
4. xAI: 5.68 s p50 total with 3.27 s TTFT. suggests upstream buffers responses before flushing rather than true streaming.
5. Anthropic: 2.14 s TTFT, 5.87 s total. slowest first byte among the very large providers, but consistent shape.

## Data

| Provider | p50 TTFT (milliseconds) | p50 total (milliseconds) | p95 TTFT (milliseconds) | p95 total (milliseconds) |
| --- | --- | --- | --- | --- |
| Alibaba | 235 ms | 1.03 s | 4.82 s | 13.4 s |
| Azure | 593 ms | 960 ms | 1.32 s | 3.35 s |
| Nebius | 659 ms | 4.14 s | 4.21 s | 41.1 s |
| OpenAI Responses | 731 ms | 6.69 s | 2.59 s | 41.5 s |
| DeepInfra | 769 ms | 2.19 s | 1.26 s | 3.63 s |
| Mistral | 1.01 s | 1.25 s | 5.35 s | 18.0 s |
| DeepSeek | 1.17 s | 5.29 s | 3.04 s | 31.7 s |
| Vertex (Gemini) | 1.29 s | 3.05 s | 19.6 s | 29.0 s |
| Vertex (Claude) | 1.44 s | 8.03 s | 4.89 s | 100.3 s |
| Bedrock | 1.85 s | 5.86 s | 7.72 s | 38.4 s |
| OpenAI | 2.00 s | 6.36 s | 15.2 s | 26.0 s |
| Anthropic | 2.14 s | 5.87 s | 4.46 s | 31.9 s |
| Moonshot | 2.62 s | 7.49 s | 12.6 s | 52.9 s |
| Minimaxi | 2.77 s | 6.14 s | 7.27 s | 24.7 s |
| Novita | 3.13 s | 7.42 s | 9.67 s | 27.9 s |
| xAI | 3.27 s | 5.67 s | 14.8 s | 20.9 s |

## Caveats

- TTFT (first_token_latency_ns) was not populated before 2026, so YoY is impossible.
- Vertex is split into Vertex (Gemini) and Vertex (Claude) by model_used; direct Google traffic is excluded as long-tail.
- A non-streaming response that the gateway reports as is_stream=true (because the SDK was set to stream but the upstream did not) will measure TTFT close to total_latency, biasing the read upward.

## Cite as

**APA.** Requesty (2026). Streaming TTFT vs total latency, April 2026. Requesty Data. https://www.requesty.ai/data/streaming-ttft-vs-total-april-2026

```bibtex
@misc{requesty_streaming_ttft_vs_total_april_2026,
  author       = {{Requesty}},
  title        = {Streaming TTFT vs total latency, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/streaming-ttft-vs-total-april-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/streaming-ttft-vs-total-april-2026/data.json) · [CSV](https://www.requesty.ai/data/streaming-ttft-vs-total-april-2026/data.csv) · [Markdown](https://www.requesty.ai/data/streaming-ttft-vs-total-april-2026/data.md)

---

# p50 latency YoY: April 2025 vs April 2026

> Has LLM latency improved over the past year? On the Requesty gateway, open-source aggregator routes compressed dramatically between April 2025 and April 2026. xAI fell 93% (9.1 s to 0.6 s), DeepInfra 91% (15.8 s to 1.4 s), DeepSeek 62% (24.3 s to 9.2 s). Frontier providers barely moved (OpenAI -5%, Anthropic 0%). Vertex (Claude) is the only major route that got slower, +131%, as heavy agentic Claude Code workloads landed on it.

*Topic: Latency and performance. Period: Apr 2025  to  Apr 2026. Last updated 2026-05-09. Permanent URL: https://www.requesty.ai/data/provider-latency-yoy-april-2026.*

## Why it matters

The OSS-aggregator tier closed most of the latency gap to frontier providers in 12 months: routing easy work onto a cheap OSS path used to cost 5-25 seconds and now costs sub-second. Workload composition is the dominant force on aggregate latency. Vertex (Claude) getting 2.3× slower while the underlying inference stack barely changed shows that "is provider X fast?" is the wrong question to ask in isolation.

## Questions this answers

- How has LLM latency changed from 2025 to 2026?
- Are open-source LLMs as fast as OpenAI now?
- Which AI providers got faster in 2026?
- Why are some LLM routes getting slower year-over-year?

## Key findings

1. OSS aggregator routes (xAI, DeepInfra, Alibaba, Novita, Nebius) compressed 89-93% YoY.
2. xAI: 9.1 s  to  0.6 s (-93%). DeepInfra: 15.8 s  to  1.4 s (-91%).
3. DeepSeek: 24.3 s  to  9.2 s (-62%). Still slow but dramatically faster.
4. Frontier providers barely moved: OpenAI -5%, Anthropic 0%.
5. Vertex (Claude) is the lone exception: 6.0 s  to  13.8 s (+131%). The route stayed put while heavy agentic Claude Code workloads moved onto it, so the work itself got bigger.
6. Practical implication: routing easy work to a cheap OSS path used to cost 5-25 seconds, now costs sub-second.

## Data

| Provider | Apr 2025 p50 (milliseconds) | Apr 2026 p50 (milliseconds) | YoY delta (percent) |
| --- | --- | --- | --- |
| xAI | 9.10 s | 600 ms | -93.00% |
| DeepInfra | 15.8 s | 1.40 s | -91.00% |
| Alibaba | 5.80 s | 500 ms | -91.00% |
| Novita | 8.80 s | 800 ms | -91.00% |
| Nebius | 22.1 s | 2.30 s | -89.00% |
| DeepSeek | 24.3 s | 9.20 s | -62.00% |
| Coding | 7.90 s | 6.10 s | -23.00% |
| OpenAI | 2.60 s | 2.50 s | -5.00% |
| Anthropic | 5.90 s | 5.90 s | 0.00% |
| Vertex (Claude) | 6.00 s | 13.8 s | 131.00% |

## Caveats

- Vertex (Gemini) had no meaningful 2025 traffic so it is not in this chart. Only Vertex (Claude) is YoY-comparable.
- Vertex (Claude) Apr 2025 sample is small and the workload that lived on it has changed substantially, so the +131% delta is more about workload mix than a true latency regression.
- Customer-base composition changed YoY, so the workload mix hitting these providers is different. Latency YoY is robust to this because it is wall-clock duration not affected by the request mix in aggregate, but interpret it as "providers behave differently AND the work has shifted", not as a controlled experiment.
- The `successful` flag semantics may have changed between 2025 and 2026, but quantiles over wall-clock duration are not affected.

## Cite as

**APA.** Requesty (2026). p50 latency YoY: April 2025 vs April 2026. Requesty Data. https://www.requesty.ai/data/provider-latency-yoy-april-2026

```bibtex
@misc{requesty_provider_latency_yoy_april_2026,
  author       = {{Requesty}},
  title        = {p50 latency YoY: April 2025 vs April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/provider-latency-yoy-april-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/provider-latency-yoy-april-2026/data.json) · [CSV](https://www.requesty.ai/data/provider-latency-yoy-april-2026/data.csv) · [Markdown](https://www.requesty.ai/data/provider-latency-yoy-april-2026/data.md)

---

# Prompt-cache hit rate per provider, April 2026

> Which AI providers have the highest prompt-cache hit rate? In April 2026 Anthropic-direct led the Requesty gateway at 77% (cached_tokens / input_tokens), Bedrock Claude was healthy at 57%, and Vertex (Claude) trailed at 24%. Same Claude model family, 3× lower hit rate. Vertex (Gemini) sat at 10% and Mistral at 4%, the floor among major routes.

*Topic: Latency and performance. Period: Apr 2026. Last updated 2026-05-09. Permanent URL: https://www.requesty.ai/data/cache-hit-rate-by-provider-april-2026.*

## Why it matters

Prompt caching directly cuts the per-request cost of long, repeated context. The difference between a 77% hit rate and a 24% hit rate on the same model family is roughly a 3× reduction in input tokens billed at full price. The Vertex-Claude gap looks like a configuration issue rather than a platform limitation, which means Claude users on Vertex are leaving substantial savings on the table without a code change.

## Questions this answers

- Which AI providers have the best prompt caching hit rate?
- Why is prompt caching so much worse on Vertex Claude than on Anthropic direct?
- How much does prompt caching reduce LLM inference cost in production?
- Which providers should I avoid if I rely on prompt caching?

## Key findings

1. Anthropic-direct: 77% cache hit, the leader by a wide margin.
2. Bedrock Claude: 57%. OpenAI: 36%. DeepSeek: 48%. Healthy.
3. Vertex (Claude): 24%. Same model as Anthropic-direct (77%) and Bedrock (57%), 3× lower hit rate. Configuration gap.
4. Vertex (Gemini): 10%. The floor among major routes.
5. Mistral: 4%. Roughly the floor; prompt caching is not a meaningful lever on that route today.
6. Moonshot reports 88% but it is a measurement artefact at 6% success rate; do not quote it.

## Data

| Provider | Cache hit rate (percent) |
| --- | --- |
| Anthropic | 77.50% |
| Bedrock | 56.90% |
| DeepSeek | 48.30% |
| Azure | 41.00% |
| OpenAI | 36.40% |
| xAI | 35.70% |
| Novita | 31.90% |
| Vertex (Claude) | 23.50% |
| Vertex (Gemini) | 9.60% |
| Mistral | 4.10% |

## Caveats

- Moonshot 88% cache-hit reading is a measurement artefact at 6% success rate. Excluded from the leader panel.
- cached_tokens semantics differ slightly by provider (which tokens count as "cached"). The ratio is meaningful but not strictly apples-to-apples across providers.

## Cite as

**APA.** Requesty (2026). Prompt-cache hit rate per provider, April 2026. Requesty Data. https://www.requesty.ai/data/cache-hit-rate-by-provider-april-2026

```bibtex
@misc{requesty_cache_hit_rate_by_provider_april_2026,
  author       = {{Requesty}},
  title        = {Prompt-cache hit rate per provider, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/cache-hit-rate-by-provider-april-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/cache-hit-rate-by-provider-april-2026/data.json) · [CSV](https://www.requesty.ai/data/cache-hit-rate-by-provider-april-2026/data.csv) · [Markdown](https://www.requesty.ai/data/cache-hit-rate-by-provider-april-2026/data.md)

---

# Claude Code median latency by provider and model, April 2026

> How does Claude Code latency vary by cloud provider? In April 2026, Anthropic Haiku is the fastest at 1.8s median provider latency. Opus latency is remarkably consistent across providers (4.5-4.9s). Vertex Sonnet is the slowest at 6.2s, roughly 40% slower than the same model on Anthropic direct.

*Topic: Latency and performance. Period: Apr 2026. Last updated 2026-05-16. Permanent URL: https://www.requesty.ai/data/coding-agent-latency-by-provider-apr-2026.*

## Why it matters

Provider choice affects both latency and reliability for the same model. Anthropic direct offers the lowest latency for Haiku and Sonnet, while Bedrock provides higher cache hit rates. Vertex delivers the fastest TTFT for Sonnet but the slowest total completion time. These tradeoffs matter for coding agents that make 50-200 API calls per session.

## Questions this answers

- Which cloud provider has the lowest latency for Claude Code?
- How does Bedrock latency compare to Anthropic direct for Opus?
- Does Vertex offer any latency advantage for Claude models?
- What is the P95 latency spread across providers?

## Key findings

1. Anthropic Haiku: 1.8s median, the fastest Claude Code path. Sub-second TTFT at 0.79s.
2. Opus latency is nearly identical across Anthropic (4.9s), Bedrock (4.9s), and Vertex (4.5s).
3. Vertex has the lowest Opus latency (4.5s) and best TTFT for Sonnet (1.4s), but highest Sonnet total latency (6.2s).
4. Bedrock achieves the highest cache hit rates (94-95%) across all model families.
5. P95 latency ranges from 8s (Anthropic Haiku) to 32s (Vertex Sonnet). Tail latency varies 4x across providers.

## Data

| Provider (Model) | Median latency | P95 latency | Median TTFT | Success rate (percent) | Cache hit rate (percent) |
| --- | --- | --- | --- | --- | --- |
| Anthropic (Haiku) | 1.8s | 8.1s | 0.8s | 95.96% | 90.33% |
| Vertex (Haiku) | 2.1s | 9.3s | 0.9s | 92.82% | 94.75% |
| Bedrock (Haiku) | 2.6s | 17.1s | 1.4s | 83.43% | 84.38% |
| Anthropic (Sonnet) | 4.4s | 24.3s | 1.9s | 97.01% | 91.71% |
| Vertex (Opus) | 4.5s | 15.8s | 1.9s | 96.46% | 95.59% |
| Bedrock (Sonnet) | 4.8s | 24.7s | 2.1s | 97.61% | 94.14% |
| Bedrock (Opus) | 4.9s | 27.4s | 2.3s | 95.99% | 94.64% |
| Anthropic (Opus) | 4.9s | 27.1s | 2.5s | 98.73% | 92.48% |
| Vertex (Sonnet) | 6.2s | 32.1s | 1.4s | 97.08% | 85.93% |

## Caveats

- Latency varies by region, time of day, and prompt length. These are global medians.
- Bedrock and Vertex have lower traffic volume than Anthropic direct, which may affect percentile stability.
- Cache hit rates depend on prompt structure and are not solely a function of the provider.

## Cite as

**APA.** Requesty (2026). Claude Code median latency by provider and model, April 2026. Requesty Data. https://www.requesty.ai/data/coding-agent-latency-by-provider-apr-2026

```bibtex
@misc{requesty_coding_agent_latency_by_provider_apr_2026,
  author       = {{Requesty}},
  title        = {Claude Code median latency by provider and model, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/coding-agent-latency-by-provider-apr-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/coding-agent-latency-by-provider-apr-2026/data.json) · [CSV](https://www.requesty.ai/data/coding-agent-latency-by-provider-apr-2026/data.csv) · [Markdown](https://www.requesty.ai/data/coding-agent-latency-by-provider-apr-2026/data.md)

---

## Topic: Reliability and ops

---

# Operational metrics per provider, April 2026

> How reliable is each LLM provider in production? In April 2026 the top eight providers on the Requesty gateway (OpenAI, Anthropic, Vertex (Gemini), Bedrock, DeepSeek, Novita, xAI) sat at 95-99% success rate. Azure trailed at 78%, Vertex (Claude) at 84%, Mistral at 86%, and Moonshot at 6%, a real reliability outlier. Streaming adoption is bimodal too: Azure 68%, Anthropic 57%, everyone else under 30%.

*Topic: Reliability and ops. Period: Apr 2026. Last updated 2026-05-09. Permanent URL: https://www.requesty.ai/data/operational-metrics-by-provider-april-2026.*

## Why it matters

Provider success rate translates directly into user-visible failures unless an application has a managed fallback chain. The 95-99% top tier is comfortably reliable; Vertex (Claude) and Azure visibly failing roughly 1 in 5 calls demands either a routing policy or active provider switching at the application layer to avoid sustained user pain.

## Questions this answers

- Which LLM provider is most reliable in production?
- What is the success rate of OpenAI vs Anthropic vs Vertex?
- Why do some LLM providers fail more often than others?
- How widely is streaming adopted across LLM providers?

## Key findings

1. Success is bimodal: top tier at 95 to 99%, Vertex (Claude) 84%, Azure 78%, Mistral 86%, Moonshot 6%.
2. Streaming adoption is bimodal: Azure 68% and Anthropic 57%. Vertex (Claude) at 28%. Everyone else <10%.
3. Cache hit rate ranges from Anthropic-direct 77% to Vertex (Claude) 24% (same model family, 3x spread).

## Data

| Provider | Success rate (percent) | Streaming (percent) | Cache hit (percent) |
| --- | --- | --- | --- |
| xAI | 99.30% | 1.30% | 35.70% |
| DeepSeek | 98.30% | 2.80% | 48.30% |
| OpenAI | 98.00% | 7.20% | 36.40% |
| Novita | 97.20% | 2.30% | 31.90% |
| Anthropic | 96.00% | 56.90% | 77.50% |
| Vertex (Gemini) | 95.90% | 3.70% | 9.60% |
| Bedrock | 95.60% | 9.70% | 56.90% |
| Mistral | 86.30% | 8.00% | 4.10% |
| Vertex (Claude) | 84.40% | 27.60% | 23.50% |
| Azure | 78.00% | 68.30% | 41.00% |
| Moonshot | 6.20% | 4.80% | 88.20% |

## Caveats

- Apr 2025 success rates are anomalously low (OpenAI 54%, Anthropic 72%) and are likely under-reported because status_code wasn't being captured then. Mar to Apr 2026 success-rate comparisons are reliable; YoY success-rate deltas should be treated softly.

## Cite as

**APA.** Requesty (2026). Operational metrics per provider, April 2026. Requesty Data. https://www.requesty.ai/data/operational-metrics-by-provider-april-2026

```bibtex
@misc{requesty_operational_metrics_by_provider_april_2026,
  author       = {{Requesty}},
  title        = {Operational metrics per provider, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/operational-metrics-by-provider-april-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/operational-metrics-by-provider-april-2026/data.json) · [CSV](https://www.requesty.ai/data/operational-metrics-by-provider-april-2026/data.csv) · [Markdown](https://www.requesty.ai/data/operational-metrics-by-provider-april-2026/data.md)

---

# Provider error code distribution, April 2026

> Why do LLM provider requests fail? Among April 2026 requests on the Requesty gateway where the upstream provider returned a non-success response, 65.8% were 429 (rate limit), 19.4% were 400 (bad request: schema mismatches, oversized payloads), and 9.4% were 403 (forbidden). 5xx availability incidents (503, 502, 529, 500, 504, 520) summed to ~4.8%. Router- and gateway-level rejections are filtered out so the chart shows only what providers themselves emit when they fail.

*Topic: Reliability and ops. Period: Apr 2026. Last updated 2026-05-09. Permanent URL: https://www.requesty.ai/data/status-code-distribution-april-2026.*

## Why it matters

Provider failures are dominated by rate-limiting under agentic load, not by genuine availability incidents. That changes the right mitigation: backoff plus a managed fallback chain absorbs the ~85% of failures that are 429 + 400 without provider changes; only the ~5% 5xx tail is irreducible. Designing retries on the assumption that "providers go down" misallocates engineering effort.

## Questions this answers

- Why do LLM API requests fail?
- What is the most common LLM provider error code?
- How often do AI providers rate-limit requests?
- What HTTP errors return from OpenAI and Anthropic?

## Key findings

1. 429 (rate limit) is the dominant provider failure mode at 65.8%. Providers throttle agentic workloads aggressively.
2. 400 (bad request) is second at 19.4%. Schema mismatches, unsupported parameters, oversized payloads.
3. 403 (forbidden) at 9.4%. Provider-side authorization, region, or model-access denials.
4. 5xx total (503, 502, 529, 500, 504, 520) sums to ~4.8%. Real provider availability incidents are uncommon but not zero.
5. Codes that disappear under this filter (404 collapses from 29.8% to 0.2%, 402 from 17.8% to 0.07%) confirm those rejections are router-level model-not-found and billing checks, not provider failures.

## Data

| Status code | Description | Bucket | % of rejections (percent) |
| --- | --- | --- | --- |
| 429 | Too Many Requests | auth_quota | 65.83% |
| 400 | Bad Request | client_error | 19.40% |
| 403 | Forbidden | auth_quota | 9.41% |
| 503 | Service Unavailable | server_error | 2.19% |
| 502 | Bad Gateway | gateway | 1.81% |
| 529 | Site Overloaded | server_error | 0.52% |
| 422 | Unprocessable | client_error | 0.24% |
| 500 | Internal Server | server_error | 0.21% |
| 404 | Not Found | not_found | 0.21% |
| 402 | Payment Required | auth_quota | 0.07% |
| 504 | Gateway Timeout | gateway | 0.06% |
| 401 | Unauthorized | auth_quota | 0.02% |
| 520 | Cloudflare Unknown | server_error | 0.02% |
| 499 | Client Closed | client_error | 0.01% |

## Caveats

- Restricted to status_code_origin = 'provider' AND successful = false, so router- and gateway-level rejections are excluded by design.
- A failed request can have multiple retries with different status codes; each retry is counted separately.

## Cite as

**APA.** Requesty (2026). Provider error code distribution, April 2026. Requesty Data. https://www.requesty.ai/data/status-code-distribution-april-2026

```bibtex
@misc{requesty_status_code_distribution_april_2026,
  author       = {{Requesty}},
  title        = {Provider error code distribution, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/status-code-distribution-april-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/status-code-distribution-april-2026/data.json) · [CSV](https://www.requesty.ai/data/status-code-distribution-april-2026/data.csv) · [Markdown](https://www.requesty.ai/data/status-code-distribution-april-2026/data.md)

---

# Policy vs direct eventual success rate, Jan-Apr 2026

> How much does using a routing policy improve LLM reliability? In April 2026 the Requesty managed-fallback policy cohort hit 99.25% eventual success rate, vs 85.01% for users calling a single provider directly. That is a 14.2 pp lift, up from a +3.0 pp gap in January. Policy reliability held a tight 97.5-99.3% band across all four months while the direct cohort swung 12 pp; the widening is driven by direct-cohort regressions, not policy degradation.

*Topic: Reliability and ops. Period: Jan 2026 - Apr 2026. Last updated 2026-05-10. Permanent URL: https://www.requesty.ai/data/policy-eventual-success-trend-jan-april-2026.*

## Why it matters

A managed fallback chain absorbs upstream provider incidents that direct callers experience as user-visible failures. Over the four months of 2026 plotted here, direct callers gave up 12 percentage points of reliability while the policy cohort barely moved. Same upstream events, opposite outcomes. That is the clearest measurable case for using an LLM gateway over calling provider APIs directly.

## Questions this answers

- How reliable are LLM routing policies vs calling providers directly?
- Does using an LLM gateway actually improve reliability?
- What success rate do AI gateways deliver in 2026?
- How much do managed fallback chains improve LLM uptime?

## Key findings

1. Policy reliability widened its lead over direct from +3.0 pp in January to +14.2 pp in April.
2. April 2026: policy 99.25%, direct 85.01%. Policies eliminated 14 percentage points of failures that direct customers absorbed.
3. Policy rate has held a tight 97.5-99.3% band for four months. Direct rate swings 12 pp (97.5% in Feb to 85.0% in Apr) because direct calls have no fallback to absorb provider-side incidents.
4. The Mar-Apr widening is driven by direct-cohort regressions, not policy degradation. Policies absorbed the same upstream issues through their fallback chain.

## Data

| Month | Policy - eventual success (percent) | Direct - eventual success (percent) |
| --- | --- | --- |
| January | 97.50% | 94.50% |
| February | 98.72% | 97.47% |
| March | 98.55% | 86.72% |
| April | 99.25% | 85.01% |

## Caveats

- Eventual success is computed at the request_id level: max(successful) across all attempts in the fallback chain.
- The direct cohort includes every non-policy provider_requested value. Volume is large enough that the headline rates are stable.

## Cite as

**APA.** Requesty (2026). Policy vs direct eventual success rate, Jan-Apr 2026. Requesty Data. https://www.requesty.ai/data/policy-eventual-success-trend-jan-april-2026

```bibtex
@misc{requesty_policy_eventual_success_trend_jan_april_2026,
  author       = {{Requesty}},
  title        = {Policy vs direct eventual success rate, Jan-Apr 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/policy-eventual-success-trend-jan-april-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/policy-eventual-success-trend-jan-april-2026/data.json) · [CSV](https://www.requesty.ai/data/policy-eventual-success-trend-jan-april-2026/data.csv) · [Markdown](https://www.requesty.ai/data/policy-eventual-success-trend-jan-april-2026/data.md)

---

# Error rate by coding agent, April 2026

> How reliable are AI coding agents? In April 2026, Roo Code leads with a 2.5% error rate across 147K calls. Claude Code sits at 7.0% across 494K calls. Forge trails at 11.2% across 1.1K calls. Kilo Code shows 10.0% error rate across 23K calls.

*Topic: Reliability and ops. Period: Apr 2026. Last updated 2026-05-16. Permanent URL: https://www.requesty.ai/data/coding-agent-error-rate-apr-2026.*

## Why it matters

Error rates directly impact developer productivity. Roo Code's 2.5% error rate means fewer than 1 in 40 calls fails. The spread across agents points to structural differences in how agents construct and retry API requests.

## Questions this answers

- Which coding agent has the lowest error rate?
- How reliable is Claude Code compared to other coding agents?
- What is the typical error rate for AI coding tools?

## Key findings

1. Roo Code: 2.5% error rate. The most reliable coding agent by a significant margin.
2. OpenCode: 3.1% error rate. Strong reliability despite rapid growth.
3. Claude Code: 7.0% error rate. Acceptable but higher than Roo Code, possibly due to more aggressive retry patterns.
4. Kilo Code: 10.0% error rate.
5. Forge: 11.2% error rate. The highest among agents with sufficient sample size.

## Data

| Agent | Error rate (percent) |
| --- | --- |
| Aider | 0.90% |
| Roo Code | 2.50% |
| OpenCode | 3.10% |
| Zed | 3.70% |
| Cline | 4.90% |
| Claude Code | 7.00% |
| Kilo Code | 10.00% |
| Forge | 11.20% |

## Caveats

- Error rate includes all non-200 HTTP status codes. Some "errors" may be expected behavior (e.g. rate limiting that triggers a retry).
- Agents with very low traffic (Cursor, GitHub Copilot, Codex CLI) are excluded due to insufficient sample size.
- Error rate does not distinguish between client-side errors (4xx) and server-side errors (5xx).

## Cite as

**APA.** Requesty (2026). Error rate by coding agent, April 2026. Requesty Data. https://www.requesty.ai/data/coding-agent-error-rate-apr-2026

```bibtex
@misc{requesty_coding_agent_error_rate_apr_2026,
  author       = {{Requesty}},
  title        = {Error rate by coding agent, April 2026},
  year         = {2026},
  howpublished = {\url{https://www.requesty.ai/data/coding-agent-error-rate-apr-2026}},
  note         = {Requesty Data}
}
```

Downloads: [JSON](https://www.requesty.ai/data/coding-agent-error-rate-apr-2026/data.json) · [CSV](https://www.requesty.ai/data/coding-agent-error-rate-apr-2026/data.csv) · [Markdown](https://www.requesty.ai/data/coding-agent-error-rate-apr-2026/data.md)