Best AI models for coding
SWE-Bench Verified measures how often a model can resolve real GitHub issues from 12 popular Python repositories. It is the most realistic coding benchmark available — scores here translate to the model's ability to ship actual pull requests, not just pass unit tests on toy problems.
- 🥇
gpt-5.2-codexOpenAI Responses·$1.75 / $14.00 per 1M84.7%84.7% - 🥈
gpt-5.4OpenAI Inc.·$2.50 / $15.00 per 1M82.1%82.1% - 🥉
gpt-5.2OpenAI Inc.·$1.75 / $14.00 per 1M79.5%79.5% - 4claude-opus-4-7Anthropic PBC·$5.00 / $25.00 per 1M78.6%78.6%
- 5claude-sonnet-4-6Anthropic PBC·$3.00 / $15.00 per 1M77.2%77.2%
- 6
gpt-5.1-chatOpenAI Inc.·$1.25 / $10.00 per 1M76.8%76.8% - 7
gpt-5OpenAI Inc.·$1.25 / $10.00 per 1M74.9%74.9% - 8claude-opus-4-6Anthropic PBC·$5.00 / $25.00 per 1M74.5%74.5%
- 9grok-4xAI Corp.·$3.00 / $15.00 per 1M72.5%72.5%
- 10
o3OpenAI Inc.·$2.00 / $8.00 per 1M71.7%71.7% - 11claude-opus-4-5Anthropic PBC·$5.00 / $25.00 per 1M71.3%71.3%
- 12claude-sonnet-4-5Anthropic PBC·$3.00 / $15.00 per 1M70.8%70.8%
- 13MiniMax-M2MiniMax·$0.30 / $1.20 per 1M69.3%69.3%
- 14
gpt-4.1OpenAI Inc.·$2.00 / $8.00 per 1M68.1%68.1% - 15
kimi-k2Google LLC (Vertex AI)·$0.60 / $2.50 per 1M65.8%65.8% - 16claude-sonnet-4Anthropic PBC·$3.00 / $15.00 per 1M65.2%65.2%
- 17
gemini-2.5-proGoogle LLC (Gemini API)·$1.25 / $10.00 per 1M63.8%63.8% - 18
claude-3-7-sonnetGoogle LLC (Vertex AI)·$3.00 / $15.00 per 1M62.3%62.3% - 19
o3-miniOpenAI Inc.·$1.10 / $4.40 per 1M61.0%61.0% - 20grok-3xAI Corp.·$5.00 / $25.00 per 1M58.3%58.3%
- 21
gpt-4.1-miniOpenAI Inc.·$0.40 / $1.60 per 1M55.1%55.1% - 22claude-haiku-4-5Anthropic PBC·$1.00 / $5.00 per 1M54.2%54.2%
- 23
gemini-2.5-flashGoogle LLC (Gemini API)·$0.30 / $2.50 per 1M53.2%53.2% - 24
deepseek-ai/DeepSeek-R1Together AI Inc.·$3.00 / $7.00 per 1M49.2%49.2% - 25
o1OpenAI Inc.·$15.00 / $60.00 per 1M48.9%48.9% - 26
gpt-4.1-nanoOpenAI Inc.·$0.10 / $0.40 per 1M42.5%42.5% - 27
deepseek-ai/DeepSeek-V3Together AI Inc.·$1.25 / $1.25 per 1M42.0%42.0% - 28
gpt-4oOpenAI Inc.·$2.50 / $10.00 per 1M38.0%38.0% - 29
meta-llama/llama-3.3-70b-instructNovita AI·$0.39 / $0.39 per 1M23.3%23.3%
Explore other rankings
How we rank
Scores for SWE-Bench Verified are sourced from official model cards, Artificial Analysis, and public leaderboards. When a model is available through multiple providers (e.g. Anthropic direct, AWS Bedrock, Google Vertex), we show one canonical entry per model family so the ranking isn't polluted by duplicates. Benchmarks measure specific skills — always validate on your own workload before committing.
One API for every model on this list
Requesty is OpenAI-compatible and routes to 400+ models. Switch between any of the models above by changing one parameter in your code.
Get started free