Data/Agentic workloads/Apr 2026

Token-weighted tool_calls share per provider, April 2026

Name: Token-weighted tool_calls share per provider, April 2026
Creator: Requesty
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Agentic workloads, LLM, gateway, provider, metrics, What share of AI output tokens is spent on tool calls?, Are tool-call payloads bigger or smaller than chat replies?, Why do request-counts and token-counts disagree on agentic share?, Which providers have the most token-heavy tool calls?

What share of LLM output tokens is spent on tool calls vs chat? In April 2026 on the Requesty gateway, Anthropic emitted 38.8% of its output tokens on `tool_calls` vs 54.2% of requests, so agentic completions are roughly 30% smaller than chat ones. OpenAI Responses showed the opposite: 34.2% of tokens vs 26.4% of requests. Vertex (Claude) had the biggest negative gap (6.1% of tokens vs 27.6% of requests).

Why it mattersCounting requests overweights short tool-call payloads; counting tokens overweights long chat replies. Two providers with the same request-level agentic share can have wildly different agentic token shares, which matters for capacity planning, billing reconciliation, and any benchmark that aggregates over tokens rather than calls. Pick the wrong axis and the same provider can look 5× more or less agentic than it actually is.

Period

Apr 2026

Updated

May 9, 2026

ID

tool-call-token-share-april-2026

§ 01

Key findings

01Anthropic: 38.8% of output tokens vs 54.2% of requests. Agentic completions are ~30% smaller than chat ones. tool_calls payloads are compact.
02OpenAI Responses: 34.2% of output tokens vs 26.4% of requests. The opposite shape. agentic completions emit more tokens than chat ones.
03Vertex (Claude): 6.1% of tokens vs 27.6% of requests. The biggest negative gap on the chart. Claude on Vertex is dominated by lots of small tool-call payloads, while chat completions on the same route are heavy.
04Vertex (Gemini): 1.5% of tokens vs 14.1% of requests. Same shape as Vertex (Claude) but more extreme. Gemini chat replies are huge, so agentic completions barely register on the token-weighted view.
05xAI: 17.2% of tokens vs 2.9% of requests. Few agentic calls, but each one is verbose.
06OpenAI direct: 2.7% of tokens vs 3.4% of requests. The two views agree. there is barely any agentic load on this route in either framing.

§ 02

Data

Provider	Tool-call output-token share(percent)	Tool-call request share(percent)	Gap (token - request)(percent)
Moonshot	54.70%	75.00%	-20.30%
Minimaxi	52.50%	50.80%	1.70%
Anthropic	38.80%	54.20%	-15.40%
OpenAI Responses	34.20%	26.40%	7.80%
Azure	18.00%	27.90%	-9.90%
xAI	17.20%	2.90%	14.30%
Bedrock	14.40%	7.00%	7.40%
Alibaba	12.20%	1.70%	10.50%
Vertex (Claude)	6.10%	27.60%	-21.50%
Novita	3.00%	1.90%	1.10%
OpenAI	2.70%	3.40%	-0.70%
Vertex (Gemini)	1.50%	14.10%	-12.60%
DeepSeek	1.20%	1.50%	-0.30%
Mistral	1.00%	1.90%	-0.90%
Nebius	0.90%	3.50%	-2.60%
Groq	0.80%	1.00%	-0.20%
DeepInfra	0.30%	0.10%	0.20%

§ 03

Cite as

APA

Click to copy

BibTeX

Click to copy

ID: tool-call-token-share-april-2026·Updated May 9, 2026·Period Apr 2026

Token-weighted tool_calls share per provider, April 2026

Token-weighted tool_calls share, April 2026

Key findings

Data

Cite as