Requesty
Data/Agentic workloads

Token-weighted tool_calls share per provider, April 2026

Token-weighted tool_calls share, April 2026

The same finish_reason mix, but weighted by output tokens instead of request count. Switch tabs to see how the picture changes.

Share of a provider's output tokens emitted during tool_calls completions.Successful completions only. Filtered to providers with ≥50,000 successful completions in the window. Gap is in percentage points.

What share of LLM output tokens is spent on tool calls vs chat? In April 2026 on the Requesty gateway, Anthropic emitted 38.8% of its output tokens on `tool_calls` vs 54.2% of requests, so agentic completions are roughly 30% smaller than chat ones. OpenAI Responses showed the opposite: 34.2% of tokens vs 26.4% of requests. Vertex (Claude) had the biggest negative gap (6.1% of tokens vs 27.6% of requests).

Why it mattersCounting requests overweights short tool-call payloads; counting tokens overweights long chat replies. Two providers with the same request-level agentic share can have wildly different agentic token shares, which matters for capacity planning, billing reconciliation, and any benchmark that aggregates over tokens rather than calls. Pick the wrong axis and the same provider can look 5× more or less agentic than it actually is.

Period
Apr 2026
Updated
May 9, 2026
ID
tool-call-token-share-april-2026
§ 01

Key findings

  • 01Anthropic: 38.8% of output tokens vs 54.2% of requests. Agentic completions are ~30% smaller than chat ones. tool_calls payloads are compact.
  • 02OpenAI Responses: 34.2% of output tokens vs 26.4% of requests. The opposite shape. agentic completions emit more tokens than chat ones.
  • 03Vertex (Claude): 6.1% of tokens vs 27.6% of requests. The biggest negative gap on the chart. Claude on Vertex is dominated by lots of small tool-call payloads, while chat completions on the same route are heavy.
  • 04Vertex (Gemini): 1.5% of tokens vs 14.1% of requests. Same shape as Vertex (Claude) but more extreme. Gemini chat replies are huge, so agentic completions barely register on the token-weighted view.
  • 05xAI: 17.2% of tokens vs 2.9% of requests. Few agentic calls, but each one is verbose.
  • 06OpenAI direct: 2.7% of tokens vs 3.4% of requests. The two views agree. there is barely any agentic load on this route in either framing.
§ 02

Data

ProviderTool-call output-token share(percent)Tool-call request share(percent)Gap (token - request)(percent)
Moonshot54.70%75.00%-20.30%
Minimaxi52.50%50.80%1.70%
Anthropic38.80%54.20%-15.40%
OpenAI Responses34.20%26.40%7.80%
Azure18.00%27.90%-9.90%
xAI17.20%2.90%14.30%
Bedrock14.40%7.00%7.40%
Alibaba12.20%1.70%10.50%
Vertex (Claude)6.10%27.60%-21.50%
Novita3.00%1.90%1.10%
OpenAI2.70%3.40%-0.70%
Vertex (Gemini)1.50%14.10%-12.60%
DeepSeek1.20%1.50%-0.30%
Mistral1.00%1.90%-0.90%
Nebius0.90%3.50%-2.60%
Groq0.80%1.00%-0.20%
DeepInfra0.30%0.10%0.20%
§ 03

Cite as

APA
Click to copy
BibTeX
Click to copy
ID: tool-call-token-share-april-2026·Updated May 9, 2026·Period Apr 2026