LLM Cost Comparison
Compare the latest models released after April 2024 based on cost to use. We’ve pulled pricing data directly from model providers and tracked how pricing changes across input size, output length, and model version. If you want to compare costs for your own workloads, try Vellum Evals.
Fastest and most affordable models
Fastest Models
Tokens/seconds
2500
2000
1500
1000
500
0
Llama 4 Scout
2600
Llama 3.3 70b
2500
Llama 3.1 70b
2100
Llama 3.1 8b
1800
Llama 3.1 405b
969
Lowest Latency (TTFT)
Seconds to first token
0.6s
0.5s
0.4s
0.3s
0.2s
0.1s
0.0s
Nova Micro
0.3
Llama 3.1 8b
0.32
Llama 4 Scout
0.33
Gemini 2.0 Flash
0.34
GPT-4o mini
0.35

Cheapest Models
Input
Output
USD per 1M tokens
0.8
0.65
0.5
0.35
0.2
0.05
Nova Micro
$
0.04
$
0.14
Gemma 3 27b
$
0.07
$
0.07
Gemini 1.5 Flash
$
0.075
$
0.3
Gemini 2.0 Flash
$
0.1
$
0.4
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.