x

Compare LLM costs for your AI work with Vellum

Send me a demo
Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.

LLM Cost Comparison

Compare the latest models released after April 2024 based on cost to use. We’ve pulled pricing data directly from model providers and tracked how pricing changes across input size, output length, and model version. If you want to compare costs for your own workloads, try Vellum Evals.

Fastest and most affordable models
Fastest Models
Tokens/seconds
2500
2000
1500
1000
500
0
Llama 4 Scout
2600
Llama 3.3 70b
2500
Llama 3.1 70b
2100
Llama 3.1 8b
1800
Llama 3.1 405b
969
Lowest Latency (TTFT)
Seconds to first token
0.6s
0.5s
0.4s
0.3s
0.2s
0.1s
0.0s
Nova Micro
0.3
Llama 3.1 8b
0.32
Llama 4 Scout
0.33
Gemini 2.0 Flash
0.34
GPT-4o mini
0.35
Cheapest Models
Input
Output
USD per 1M tokens
0.8
0.65
0.5
0.35
0.2
0.05
Nova Micro
$
0.04
$
0.14
Gemma 3 27b
$
0.07
$
0.07
Gemini 1.5 Flash
$
0.075
$
0.3
Gemini 2.0 Flash
$
0.1
$
0.4
Cost
LLM Cost Comparison
Showing 0 out of 20 results
Reset All
Claude 4 Opus
$
15
n/a
$
75
n/a
n/a
t/s
1.95
seconds
n/a
Claude 4 Sonnet
$
3
n/a
$
15
n/a
n/a
t/s
1.9
seconds
n/a
Gemini 2.5 Flash
$
0.15
n/a
$
0.6
n/a
200
n/a
t/s
0.35
seconds
n/a
OpenAI o3
$
10
n/a
$
40
n/a
94
n/a
t/s
28
seconds
n/a
OpenAI o4-mini
$
1.1
n/a
$
4.4
n/a
135
n/a
t/s
35.3
seconds
n/a
GPT-4.1 nano
$
0.1
n/a
$
0.4
n/a
n/a
t/s
seconds
n/a
GPT-4.1 mini
$
0.4
n/a
$
1.6
n/a
n/a
t/s
seconds
n/a
GPT-4.1
$
2
n/a
$
8
n/a
n/a
t/s
seconds
n/a
Llama 4 Scout
$
0.11
n/a
$
0.34
n/a
2600
n/a
t/s
0.33
seconds
n/a
Llama 4 Maverick
$
0.2
n/a
$
0.6
n/a
126
n/a
t/s
0.45
seconds
n/a
Gemma 3 27b
$
0.07
n/a
$
0.07
n/a
59
n/a
t/s
0.72
seconds
n/a
Grok 3 [Beta]
$
n/a
$
n/a
n/a
t/s
seconds
n/a
Gemini 2.5 Pro
$
1.25
n/a
$
10
n/a
191
n/a
t/s
30
seconds
n/a
Claude 3.7 Sonnet
$
3
n/a
$
15
n/a
78
n/a
t/s
0.91
seconds
n/a
GPT-4.5
$
75
n/a
$
150
n/a
48
n/a
t/s
1.25
seconds
n/a
Claude 3.7 Sonnet [R]
$
3
n/a
$
15
n/a
78
n/a
t/s
0.95
seconds
n/a
DeepSeek-R1
$
0.55
n/a
$
2.19
n/a
24
n/a
t/s
4
seconds
n/a
OpenAI o3-mini
$
1.1
n/a
$
4.4
n/a
214
n/a
t/s
14
seconds
n/a
OpenAI o1-mini
$
3
n/a
$
12
n/a
220
n/a
t/s
11.43
seconds
n/a
Qwen2.5-VL-32B
$
n/a
$
n/a
n/a
t/s
seconds
n/a
DeepSeek V3 0324
$
0.27
n/a
$
1.1
n/a
33
n/a
t/s
4
seconds
n/a
OpenAI o1
$
15
n/a
$
60
n/a
100
n/a
t/s
30
seconds
n/a
Gemini 2.0 Flash
$
0.1
n/a
$
0.4
n/a
257
n/a
t/s
0.34
seconds
n/a
Llama 3.3 70b
$
0.59
n/a
$
0.7
n/a
2500
n/a
t/s
0.52
seconds
n/a
Nova Pro
$
1
n/a
$
4
n/a
128
n/a
t/s
0.64
seconds
n/a
Claude 3.5 Haiku
$
0.8
n/a
$
4
n/a
66
n/a
t/s
0.88
seconds
n/a
Llama 3.1 405b
$
3.5
n/a
$
3.5
n/a
969
n/a
t/s
0.73
seconds
n/a
GPT-4o mini
$
0.15
n/a
$
0.6
n/a
65
n/a
t/s
0.35
seconds
n/a
GPT-4o
$
2.5
n/a
$
10
n/a
143
n/a
t/s
0.51
seconds
n/a
Claude 3.5 Sonnet
$
3
n/a
$
15
n/a
78
n/a
t/s
1.22
seconds
n/a
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.