updated
18 Nov 2025
Coding LLM Leaderboard
This leaderboard shows what are the best LLMs for writing and editing code (released after April 2024). Data comes from model providers, open-source contributors, and Vellumโs own evaluations. Want to see how these models handle your own repos or workflows? Try ย Vellum Evals.
Top models across programming benchmarks
Best in Live CodeBench
Score (Percentage)
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%

Kimi K2 Thinking
83.1
Gemini 3 Pro
79.7
Grok 3 [Beta]
79.4
Grok 4
79

OpenAI o3-mini
74.1
Best in Aider Polyglot
Score (Percentage)
100%
90%
80%
70%
60%
50%
Claude Opus 4.5
89.4

GPT-5
88
Gemini 2.5 Pro
82.2

OpenAI o3
81.3

OpenAI o4-mini
68.9
Best in Agentic Coding (SWE Bench)
Score (Percentage)
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Claude Sonnet 4.5
82
Claude Opus 4.5
80.9

GPT 5.2
80

GPT 5.1
76.3
Gemini 3 Pro
76.2
Idependent evals
Best in Tool Use (BFCL)
Score (Percentage)
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Llama 3.1 405b
81.1
Llama 3.3 70b
77.3

GPT-4o
72.08

GPT-4.5
69.94
Nova Pro
68.4
Best in Adaptive Reasoning (GRIND)
Score (Percentage)
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Gemini 2.5 Pro
82.1
Claude 4 Sonnet
75
Claude 4 Opus
67.9
Claude 3.7 Sonnet [R]
60.7
Nemotron Ultra 253B
57.1
Best Overall (Humanity's Last Exam)
Score (Percentage)
50
40
30
20
10
0
Gemini 3 Pro
45.8

Kimi K2 Thinking
44.9

GPT-5
35.2
Grok 4
25.4
Gemini 2.5 Pro
21.6
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.