LLM Cost Calculator
Compare API costs across 12 popular models. See how much smart routing and caching save at your scale.
Your workload
Optimization levers
Cost per model (300,000 requests/month)
| Model | Provider | Input $/M | Output $/M | Monthly cost | With caching |
|---|---|---|---|---|---|
| Mistral Small | Mistral | $0.10 | $0.30 | $60.00 | $42.00 |
| DeepSeek V3 | DeepSeek | $0.14 | $0.28 | $67.20 | $47.04 |
| Llama 4 Scout | Meta (via Groq) | $0.11 | $0.34 | $67.20 | $47.04 |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | $108 | $75.60 |
| Gemini 2.5 Flash | $0.15 | $0.60 | $108 | $75.60 | |
| DeepSeek R1 | DeepSeek | $0.55 | $2.19 | $395 | $276 |
| Claude Haiku 3.5 | Anthropic | $0.80 | $4.00 | $672 | $470 |
| Mistral Large | Mistral | $2.00 | $6.00 | $1,200 | $840 |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | $1,440 | $1,008 |
| Gemini 2.5 Pro | $1.25 | $10.00 | $1,500 | $1,050 | |
| GPT-4o | OpenAI | $2.50 | $10.00 | $1,800 | $1,260 |
| Claude Sonnet 4 | Anthropic | $3.00 | $15.00 | $2,520 | $1,764 |
Get these savings automatically
Requesty routes your requests to the cheapest model that meets your quality bar, caches repeated prompts, and fails over automatically. One API, 400+ models, zero infrastructure.
Start free with $10 creditsFrequently asked questions
How much does it cost to use LLM APIs?
LLM API costs vary widely by model. GPT-4.1 costs $2 per million input tokens and $8 per million output tokens. DeepSeek V3 costs $0.14 per million input and $0.28 per million output. For a production app doing 50,000 requests per day, monthly costs range from $126 (DeepSeek V3) to $9,000+ (GPT-4.1). Smart routing through an AI gateway can reduce this by 50 to 80%.
How does smart routing reduce LLM costs?
Smart routing analyzes each request and sends simple tasks to cheap models (like DeepSeek V3 or Gemini Flash) and complex tasks to frontier models (like GPT-4 or Claude Sonnet). Since most production traffic is simple, routing 50 to 70% of requests to budget models typically cuts total spend by 50 to 80% with no quality loss on complex tasks.
What is the cheapest LLM API for production use?
As of 2026, DeepSeek V3 ($0.14 per million input, $0.28 per million output), Gemini 2.5 Flash ($0.15 per million input, $0.60 per million output), and Mistral Small ($0.10 per million input, $0.30 per million output) are the cheapest production-quality LLM APIs. For reasoning tasks, DeepSeek R1 offers frontier quality at $0.55 per million input. Using an AI gateway lets you access all of these through one API.
