AI model rankings
Ranked lists of the best AI models across the dimensions that actually matter — coding ability, reasoning, math, price, and context window. Each list is backed by published benchmarks or live pricing data.
Best for coding
View →Ranked by SWE-Bench Verified
SWE-Bench Verified measures how often a model can resolve real GitHub issues from 12 popular Python repositories. It is the most realistic coding benchmark available — scores here translate to the model's ability to ship actual pull requests, not just pass unit tests on toy problems.
Best for reasoning
View →Ranked by GPQA Diamond
GPQA Diamond is a set of graduate-level science questions written by domain experts and filtered so that PhD students with internet access still struggle. It's the most reliable signal we have for "does this model actually reason" vs "is it pattern-matching training data".
Best at math
View →Ranked by AIME 2024
AIME 2024 is 15 competition math problems each solved by the top US high schoolers. Models need real symbolic reasoning to succeed — memorization gets you nowhere since solutions involve multiple novel steps.
Cheapest
View →Lowest input + output price per 1M tokens
Ranked by combined input + output price per million tokens (excluding free-tier models). These are production-ready models that punch well above their price point — great defaults when cost matters and you can test model quality on your own workload.
Longest context
View →Max tokens in a single prompt
A larger context window means more tokens you can fit in a single prompt — useful for whole-codebase analysis, long document Q&A, and agentic workflows. Note: effective quality often degrades past 128K tokens; prompt caching (supported on many models) is usually a better approach for repeated long context than brute-forcing more tokens in every call.
