Coding LLM Leaderboard
This leaderboard shows what are the best LLMs for writing and editing code (released after April 2024). Data comes from model providers, open-source contributors, and Vellumโs own evaluations. Want to see how these models handle your own repos or workflows? Try ย Vellum Evals.
Top models across programming benchmarks
Best in Live CodeBench
Score (Percentage)
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Grok 3 [Beta]
79.4

OpenAI o3-mini
74.1
Gemini 2.5 Pro
69
DeepSeek-R1
64.3
Nemotron Ultra 253B
64
Best in Aider Polyglot
Score (Percentage)
100%
90%
80%
70%
60%
50%
Gemini 2.5 Pro
82.2

OpenAI o3
81.3

OpenAI o4-mini
68.9
Claude 3.7 Sonnet [R]
64.9
DeepSeek-R1
64
Best in Agentic Coding (SWE Bench)
Score (Percentage)
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Claude 4 Sonnet
72.7
Claude 4 Opus
72.5
Claude 3.7 Sonnet [R]
70.3

OpenAI o3
69.1

OpenAI o4-mini
68.1
Idependent evals
Best in Tool Use (BFCL)
Score (Percentage)
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Llama 3.1 405b
81.1
Llama 3.3 70b
77.3

GPT-4o
72.08

GPT-4.5
69.94
Nova Pro
68.4
Best in Adaptive Reasoning (GRIND)
Score (Percentage)
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Gemini 2.5 Pro
82.1
Claude 4 Sonnet
75
Claude 4 Opus
67.9
Claude 3.7 Sonnet [R]
60.7
Nemotron Ultra 253B
57.1
Best Overall (Humanity's Last Exam)
Score (Percentage)
50
40
30
20
10
0
Gemini 2.5 Pro
21.6

OpenAI o3
20.32

OpenAI o4-mini
14.28

OpenAI o3-mini
14
Gemini 2.5 Flash
12.1
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Compare models
Select two models to compare
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
vs
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Model
Context size
Cutoff date
I/O cost
Max output
Latency
Speed
Claude 4 Opus
200,000
n/a
Mar 2025
n/a
$
n/a
15
/
$
75
32,000
n/a
1.95
s
n/a
t/s
n/a
Claude 4 Sonnet
200,000
n/a
Mar 2025
n/a
$
n/a
3
/
$
15
64,000
n/a
1.9
s
n/a
t/s
n/a
Gemini 2.5 Flash
1,000,000
n/a
May 2024
n/a
$
n/a
0.15
/
$
0.6
30,000
n/a
0.35
s
n/a
200
t/s
n/a
OpenAI o3
200,000
n/a
May 2024
n/a
$
n/a
10
/
$
40
100,000
n/a
28
s
n/a
94
t/s
n/a
OpenAI o4-mini
200,000
n/a
May 2024
n/a
$
n/a
1.1
/
$
4.4
100,000
n/a
35.3
s
n/a
135
t/s
n/a
Nemotron Ultra 253B
n/a
n/a
$
n/a
/
$
n/a
s
n/a
t/s
n/a
GPT-4.1 nano
1,000,000
n/a
December 2024
n/a
$
n/a
0.1
/
$
0.4
32,000
n/a
s
n/a
t/s
n/a
GPT-4.1 mini
1,000,000
n/a
December 2024
n/a
$
n/a
0.4
/
$
1.6
16,000
n/a
s
n/a
t/s
n/a
GPT-4.1
1,000,000
n/a
December 2024
n/a
$
n/a
2
/
$
8
16,000
n/a
s
n/a
t/s
n/a
Llama 4 Behemoth
n/a
November 2024
n/a
$
n/a
/
$
n/a
s
n/a
t/s
n/a
Llama 4 Scout
10,000,000
n/a
November 2024
n/a
$
n/a
0.11
/
$
0.34
8,000
n/a
0.33
s
n/a
2600
t/s
n/a
Llama 4 Maverick
10,000,000
n/a
November 2024
n/a
$
n/a
0.2
/
$
0.6
8,000
n/a
0.45
s
n/a
126
t/s
n/a
Gemma 3 27b
128,000
n/a
Nov 2024
n/a
$
n/a
0.07
/
$
0.07
8192
n/a
0.72
s
n/a
59
t/s
n/a
Grok 3 mini [Beta]
/
n/a
Nov 2024
n/a
$
n/a
/
$
n/a
s
n/a
t/s
n/a
Grok 3 [Beta]
/
n/a
Nov 2024
n/a
$
n/a
/
$
n/a
s
n/a
t/s
n/a
Gemini 2.5 Pro
1,000,000
n/a
Nov 2024
n/a
$
n/a
1.25
/
$
10
65,000
n/a
30
s
n/a
191
t/s
n/a
Claude 3.7 Sonnet
200,000
n/a
Nov 2024
n/a
$
n/a
3
/
$
15
128,000
n/a
0.91
s
n/a
78
t/s
n/a
GPT-4.5
128,000
n/a
Nov 2024
n/a
$
n/a
75
/
$
150
16,384
n/a
1.25
s
n/a
48
t/s
n/a
Claude 3.7 Sonnet [R]
200,000
n/a
Nov 2024
n/a
$
n/a
3
/
$
15
64,000
n/a
0.95
s
n/a
78
t/s
n/a
DeepSeek-R1
128,000
n/a
Dec 2024
n/a
$
n/a
0.55
/
$
2.19
8,000
n/a
4
s
n/a
24
t/s
n/a
OpenAI o3-mini
200,000
n/a
Dec 2024
n/a
$
n/a
1.1
/
$
4.4
8,000
n/a
14
s
n/a
214
t/s
n/a
OpenAI o1-mini
128,000
n/a
Dec 2024
n/a
$
n/a
3
/
$
12
8,000
n/a
11.43
s
n/a
220
t/s
n/a
Qwen2.5-VL-32B
131,000
n/a
Dec 2024
n/a
$
n/a
/
$
8,000
n/a
s
n/a
t/s
n/a
DeepSeek V3 0324
128,000
n/a
Dec 2024
n/a
$
n/a
0.27
/
$
1.1
8,000
n/a
4
s
n/a
33
t/s
n/a
OpenAI o1
200,000
n/a
Oct 2023
n/a
$
n/a
15
/
$
60
100,000
n/a
30
s
n/a
100
t/s
n/a
Gemini 2.0 Flash
1,000,000
n/a
Aug 2024
n/a
$
n/a
0.1
/
$
0.4
8,192
n/a
0.34
s
n/a
257
t/s
n/a
Llama 3.3 70b
128,000
n/a
July 2024
n/a
$
n/a
0.59
/
$
0.7
32,768
n/a
0.52
s
n/a
2500
t/s
n/a
Nova Micro
128,000
n/a
July 2024
n/a
$
n/a
0.04
/
$
0.14
4096
n/a
0.3
s
n/a
t/s
n/a
Nova Lite
300,000
n/a
July 2024
n/a
$
n/a
/
$
4096
n/a
0.4
s
n/a
t/s
n/a
Nova Pro
300,000
n/a
July 2024
n/a
$
n/a
1
/
$
4
4096
n/a
0.64
s
n/a
128
t/s
n/a
Claude 3.5 Haiku
200,000
n/a
July 2024
n/a
$
n/a
0.8
/
$
4
4096
n/a
0.88
s
n/a
66
t/s
n/a
Llama 3.1 405b
128,000
n/a
Dec 2023
n/a
$
n/a
3.5
/
$
3.5
4096
n/a
0.73
s
n/a
969
t/s
n/a
Llama 3.1 70b
128,000
n/a
Dec 2023
n/a
$
n/a
/
$
4096
n/a
s
n/a
2100
t/s
n/a
Llama 3.1 8b
128,000
n/a
Dec 2023
n/a
$
n/a
/
$
4096
n/a
0.32
s
n/a
1800
t/s
n/a
Gemini 1.5 Flash
1,000,000
n/a
May 2024
n/a
$
n/a
0.075
/
$
0.3
4096
n/a
1.06
s
n/a
166
t/s
n/a
Gemini 1.5 Pro
2,000,000
n/a
May 2024
n/a
$
n/a
/
$
4096
n/a
s
n/a
61
t/s
n/a
GPT-3.5 Turbo
16,400
n/a
Sept 2023
n/a
$
n/a
/
$
4096
n/a
s
n/a
t/s
n/a
GPT-4o mini
128,000
n/a
Oct 2023
n/a
$
n/a
0.15
/
$
0.6
4096
n/a
0.35
s
n/a
65
t/s
n/a
GPT-Turbo
128,000
n/a
Dec 2023
n/a
$
n/a
/
$
4096
n/a
s
n/a
t/s
n/a
GPT-4o
128,000
n/a
Oct 2023
n/a
$
n/a
2.5
/
$
10
4096
n/a
0.51
s
n/a
143
t/s
n/a
Claude 3 Haiku
200,000
n/a
Apr 2024
n/a
$
n/a
/
$
4096
n/a
s
n/a
t/s
n/a
Claude 3.5 Sonnet
200,000
n/a
Apr 2024
n/a
$
n/a
3
/
$
15
4096
n/a
1.22
s
n/a
78
t/s
n/a
Claude 3 Opus
200,000
n/a
Aug 2023
n/a
$
n/a
/
$
4096
n/a
s
n/a
t/s
n/a
GPT-4
8192
n/a
Dec 2023
n/a
$
n/a
/
$
4096
n/a
s
n/a
t/s
n/a
Claude 4 Opus
200,000
n/a
Mar 2025
n/a
$
15
n/a
/
$
75
32,000
n/a
1.95
s
n/a
t/s
n/a
Claude 4 Sonnet
200,000
n/a
Mar 2025
n/a
$
3
n/a
/
$
15
64,000
n/a
1.9
s
n/a
t/s
n/a
Gemini 2.5 Flash
1,000,000
n/a
May 2024
n/a
$
0.15
n/a
/
$
0.6
30,000
n/a
0.35
s
n/a
200
t/s
n/a
OpenAI o3
200,000
n/a
May 2024
n/a
$
10
n/a
/
$
40
100,000
n/a
28
s
n/a
94
t/s
n/a
OpenAI o4-mini
200,000
n/a
May 2024
n/a
$
1.1
n/a
/
$
4.4
100,000
n/a
35.3
s
n/a
135
t/s
n/a
Nemotron Ultra 253B
n/a
n/a
$
n/a
/
$
n/a
s
n/a
t/s
n/a
GPT-4.1 nano
1,000,000
n/a
December 2024
n/a
$
0.1
n/a
/
$
0.4
32,000
n/a
s
n/a
t/s
n/a
GPT-4.1 mini
1,000,000
n/a
December 2024
n/a
$
0.4
n/a
/
$
1.6
16,000
n/a
s
n/a
t/s
n/a
GPT-4.1
1,000,000
n/a
December 2024
n/a
$
2
n/a
/
$
8
16,000
n/a
s
n/a
t/s
n/a
Llama 4 Behemoth
n/a
November 2024
n/a
$
n/a
/
$
n/a
s
n/a
t/s
n/a
Llama 4 Scout
10,000,000
n/a
November 2024
n/a
$
0.11
n/a
/
$
0.34
8,000
n/a
0.33
s
n/a
2600
t/s
n/a
Llama 4 Maverick
10,000,000
n/a
November 2024
n/a
$
0.2
n/a
/
$
0.6
8,000
n/a
0.45
s
n/a
126
t/s
n/a
Gemma 3 27b
128,000
n/a
Nov 2024
n/a
$
0.07
n/a
/
$
0.07
8192
n/a
0.72
s
n/a
59
t/s
n/a
Grok 3 mini [Beta]
/
n/a
Nov 2024
n/a
$
n/a
/
$
n/a
s
n/a
t/s
n/a
Grok 3 [Beta]
/
n/a
Nov 2024
n/a
$
n/a
/
$
n/a
s
n/a
t/s
n/a
Gemini 2.5 Pro
1,000,000
n/a
Nov 2024
n/a
$
1.25
n/a
/
$
10
65,000
n/a
30
s
n/a
191
t/s
n/a
Claude 3.7 Sonnet
200,000
n/a
Nov 2024
n/a
$
3
n/a
/
$
15
128,000
n/a
0.91
s
n/a
78
t/s
n/a
GPT-4.5
128,000
n/a
Nov 2024
n/a
$
75
n/a
/
$
150
16,384
n/a
1.25
s
n/a
48
t/s
n/a
Claude 3.7 Sonnet [R]
200,000
n/a
Nov 2024
n/a
$
3
n/a
/
$
15
64,000
n/a
0.95
s
n/a
78
t/s
n/a
DeepSeek-R1
128,000
n/a
Dec 2024
n/a
$
0.55
n/a
/
$
2.19
8,000
n/a
4
s
n/a
24
t/s
n/a
OpenAI o3-mini
200,000
n/a
Dec 2024
n/a
$
1.1
n/a
/
$
4.4
8,000
n/a
14
s
n/a
214
t/s
n/a
OpenAI o1-mini
128,000
n/a
Dec 2024
n/a
$
3
n/a
/
$
12
8,000
n/a
11.43
s
n/a
220
t/s
n/a
Qwen2.5-VL-32B
131,000
n/a
Dec 2024
n/a
$
n/a
/
$
8,000
n/a
s
n/a
t/s
n/a
DeepSeek V3 0324
128,000
n/a
Dec 2024
n/a
$
0.27
n/a
/
$
1.1
8,000
n/a
4
s
n/a
33
t/s
n/a
OpenAI o1
200,000
n/a
Oct 2023
n/a
$
15
n/a
/
$
60
100,000
n/a
30
s
n/a
100
t/s
n/a
Gemini 2.0 Flash
1,000,000
n/a
Aug 2024
n/a
$
0.1
n/a
/
$
0.4
8,192
n/a
0.34
s
n/a
257
t/s
n/a
Llama 3.3 70b
128,000
n/a
July 2024
n/a
$
0.59
n/a
/
$
0.7
32,768
n/a
0.52
s
n/a
2500
t/s
n/a
Nova Micro
128,000
n/a
July 2024
n/a
$
0.04
n/a
/
$
0.14
4096
n/a
0.3
s
n/a
t/s
n/a
Nova Lite
300,000
n/a
July 2024
n/a
$
n/a
/
$
4096
n/a
0.4
s
n/a
t/s
n/a
Nova Pro
300,000
n/a
July 2024
n/a
$
1
n/a
/
$
4
4096
n/a
0.64
s
n/a
128
t/s
n/a
Claude 3.5 Haiku
200,000
n/a
July 2024
n/a
$
0.8
n/a
/
$
4
4096
n/a
0.88
s
n/a
66
t/s
n/a
Llama 3.1 405b
128,000
n/a
Dec 2023
n/a
$
3.5
n/a
/
$
3.5
4096
n/a
0.73
s
n/a
969
t/s
n/a
Llama 3.1 70b
128,000
n/a
Dec 2023
n/a
$
n/a
/
$
4096
n/a
s
n/a
2100
t/s
n/a
Llama 3.1 8b
128,000
n/a
Dec 2023
n/a
$
n/a
/
$
4096
n/a
0.32
s
n/a
1800
t/s
n/a
Gemini 1.5 Flash
1,000,000
n/a
May 2024
n/a
$
0.075
n/a
/
$
0.3
4096
n/a
1.06
s
n/a
166
t/s
n/a
Gemini 1.5 Pro
2,000,000
n/a
May 2024
n/a
$
n/a
/
$
4096
n/a
s
n/a
61
t/s
n/a
GPT-3.5 Turbo
16,400
n/a
Sept 2023
n/a
$
n/a
/
$
4096
n/a
s
n/a
t/s
n/a
GPT-4o mini
128,000
n/a
Oct 2023
n/a
$
0.15
n/a
/
$
0.6
4096
n/a
0.35
s
n/a
65
t/s
n/a
GPT-Turbo
128,000
n/a
Dec 2023
n/a
$
n/a
/
$
4096
n/a
s
n/a
t/s
n/a
GPT-4o
128,000
n/a
Oct 2023
n/a
$
2.5
n/a
/
$
10
4096
n/a
0.51
s
n/a
143
t/s
n/a
Claude 3 Haiku
200,000
n/a
Apr 2024
n/a
$
n/a
/
$
4096
n/a
s
n/a
t/s
n/a
Claude 3.5 Sonnet
200,000
n/a
Apr 2024
n/a
$
3
n/a
/
$
15
4096
n/a
1.22
s
n/a
78
t/s
n/a
Claude 3 Opus
200,000
n/a
Aug 2023
n/a
$
n/a
/
$
4096
n/a
s
n/a
t/s
n/a
GPT-4
8192
n/a
Dec 2023
n/a
$
n/a
/
$
4096
n/a
s
n/a
t/s
n/a
Standard Benchmarks
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.