Best AI models for coding

SWE-Bench Verified measures how often a model can resolve real GitHub issues from 12 popular Python repositories. It is the most realistic coding benchmark available — scores here translate to the model's ability to ship actual pull requests, not just pass unit tests on toy problems.

Explore other rankings

Best for reasoning

Ranked by GPQA Diamond

Best at math

Ranked by AIME 2024

Cheapest

Lowest input + output price per 1M tokens

Longest context

Max tokens in a single prompt

How we rank

Scores for SWE-Bench Verified are sourced from official model cards, Artificial Analysis, and public leaderboards. When a model is available through multiple providers (e.g. Anthropic direct, AWS Bedrock, Google Vertex), we show one canonical entry per model family so the ranking isn't polluted by duplicates. Benchmarks measure specific skills — always validate on your own workload before committing.

One API for every model on this list

Requesty is OpenAI-compatible and routes to 400+ models. Switch between any of the models above by changing one parameter in your code.

Get started free