---
title: "LLM Leaderboard"
description: "Latest public benchmark performance for SOTA LLM model versions. Data from model providers and independently run evaluations."
canonical_url: "https://www.vellum.ai/llm-leaderboard"
md_url: "https://www.vellum.ai/md/llm-leaderboard"
type: "leaderboard"
---

# LLM Leaderboard

Latest public benchmark performance for SOTA LLM model versions. Data from model providers and independently run evaluations.

## Featured Benchmark

### Best Overall (Humanity's Last Exam)

| Rank | Model | Score |
| --- | --- | --- |
| 1 | Claude Mythos 5 | 64.5% |
| 2 | Claude Opus 4.8 | 57.9% |
| 3 | Claude Sonnet 5 | 57.4% |
| 4 | GLM 5.2 | 54.7% |
| 5 | Kimi K2.6 | 54% |

## Top Models per Benchmark

### Best in Reasoning (GPQA Diamond)

| Rank | Model | Score |
| --- | --- | --- |
| 1 | Claude Sonnet 5 | 96.2% |
| 2 | Claude 3 Opus | 95.4% |
| 3 | Gemini 3.1 Pro | 94.3% |
| 4 | Claude Opus 4.7 | 94.2% |
| 5 | Claude Fable 5 | 94.1% |

### Best in Agentic Coding (SWE Bench)

| Rank | Model | Score |
| --- | --- | --- |
| 1 | Claude Mythos 5 | 95.5% |
| 2 | Claude Fable 5 | 95% |
| 3 | Claude Opus 4.8 | 88.6% |
| 4 | Claude Opus 4.7 | 87.6% |
| 5 | Claude Sonnet 5 | 85.2% |

### Best for Work Automations (AutoBench)

| Rank | Model | Score |
| --- | --- | --- |
| 1 | Claude Fable 5 | 17.4% |
| 2 | Claude Opus 4.8 | 15.5% |
| 3 | Claude Sonnet 5 | 13.5% |
| 4 | GPT-5.5 | 12.9% |
| 5 | Claude Sonnet 4.6 | 5.3% |

### Best in Computer Use (OSWorld)

| Rank | Model | Score |
| --- | --- | --- |
| 1 | Claude Fable 5 | 85% |
| 2 | Claude Opus 4.8 | 83.4% |
| 3 | Claude Sonnet 5 | 81.2% |
| 4 | GPT-5.5 | 78.7% |
| 5 | Claude Sonnet 4.6 | 78.5% |

### Best in Browsing (BrowseComp)

| Rank | Model | Score |
| --- | --- | --- |
| 1 | Claude Fable 5 | 88% |
| 2 | DeepSeek V4 Flash | 85.9% |
| 3 | Gemini 3.1 Pro | 85.9% |
| 4 | Claude Sonnet 5 | 84.7% |
| 5 | GPT-5.5 | 84.4% |

### Best in Terminal Use (Terminal-Bench 2.1)

| Rank | Model | Score |
| --- | --- | --- |
| 1 | Claude Mythos 5 | 88% |
| 2 | Claude Fable 5 | 84.3% |
| 3 | GPT-5.5 | 82.7% |
| 4 | GLM 5.2 | 81% |
| 5 | Claude Sonnet 5 | 80.4% |

## Fastest Models (Tokens/sec)



| Rank | Model | Throughput |

| --- | --- | --- |

| 1 | Llama 4 Scout | 2600 t/s |

| 2 | Llama 3.1 405b | 969 t/s |

| 3 | GLM 5.2 | 347 t/s |

| 4 | Kimi K2.6 | 342.6 t/s |

| 5 | Kimi K2.5 | 337.7 t/s |

## Lowest Latency (TTFT)



| Rank | Model | Latency |

| --- | --- | --- |

| 1 | GPT-5.3 Codex | 0.003s |

| 2 | Nova Micro | 0.3s |

| 3 | Llama 4 Scout | 0.33s |

| 4 | Gemini 2.0 Flash | 0.34s |

| 5 | GPT-4o mini | 0.35s |

## Cheapest Models (per 1M tokens)



| Rank | Model | Input / Output |

| --- | --- | --- |

| 1 | Nova Micro | $0.04 / $0.14 |

| 2 | Gemini 1.5 Flash | $0.075 / $0.3 |

| 3 | Gemini 2.0 Flash | $0.1 / $0.4 |

| 4 | GPT-4.1 nano | $0.1 / $0.4 |

| 5 | Llama 4 Scout | $0.11 / $0.34 |

## All Models



| Model | Provider | Context Window | Input Cost (1M) | Output Cost (1M) | Knowledge Cutoff |

| --- | --- | --- | --- | --- | --- |

| Claude Opus 4.7 | Anthropic | 128,000 | $5 | $25 | Apr 2026 |

| Claude Opus 4.6 | Anthropic | 128,000 | $5 | $25 | May 2025 |

| Claude Sonnet 4.6 | Anthropic | 64,000 | $3 | $15 | Aug 2025 |

| Kimi K2.5 | Kimi | 33,000 | $0.6 | $2.5 | Apr 2024 |

| GPT-5.3 Codex | OpenAI | 128,000 | $1.75 | $14 | Aug 2025 |

| Claude 3.5 Sonnet | Anthropic | 4096 | $3 | $15 | Apr 2024 |

| GPT-4o | OpenAI | 4096 | $2.5 | $10 | Oct 2023 |

| GPT-4o mini | OpenAI | 4096 | $0.15 | $0.6 | Oct 2023 |

| Llama 3.1 405b | Meta | 4096 | $3.5 | $3.5 | Dec 2023 |

| Claude 3.5 Haiku | Anthropic | 4096 | $0.8 | $4 | July 2024 |

| Nova Pro | AWS | 4096 | $1 | $4 | July 2024 |

| Llama 3.3 70b | Meta | 32,768 | $0.59 | $0.7 | July 2024 |

| Gemini 2.0 Flash | Google | 8,192 | $0.1 | $0.4 | Aug 2024 |

| OpenAI o1 | OpenAI | 100,000 | $15 | $60 | Oct 2023 |

| DeepSeek V3 0324 | DeepSeek | 8,000 | $0.27 | $1.1 | Dec 2024 |

| Qwen2.5-VL-32B | Qwen | 8,000 | - | - | Dec 2024 |

| OpenAI o1-mini | OpenAI | 8,000 | $3 | $12 | Dec 2024 |

| OpenAI o3-mini | OpenAI | 8,000 | $1.1 | $4.4 | Dec 2024 |

| DeepSeek-R1 | DeepSeek | 8,000 | $0.55 | $2.19 | Dec 2024 |

| Claude 3.7 Sonnet [R] | Anthropic | 64,000 | $3 | $15 | Nov 2024 |

| GPT-4.5  | OpenAI | 16,384 | $75 | $150 | Nov 2024 |

| Claude 3.7 Sonnet | Anthropic | 128,000 | $3 | $15 | Nov 2024 |

| Gemini 2.5 Pro | Google | 65,000 | $1.25 | $10 | Nov 2024 |

| Grok 3 [Beta] | xAI | / | - | - | Nov 2024 |

| Gemma 3 27b | Google | 8192 | $0.07 | $0.07 | Nov 2024 |

| Llama 4 Maverick | Meta | 8,000 | $0.2 | $0.6 | November 2024 |

| Llama 4 Scout | Meta | 8,000 | $0.11 | $0.34 | November 2024 |

| Llama 4 Behemoth | Meta | - | - | - | November 2024 |

| GPT-4.1 | OpenAI | 16,000 | $2 | $8 | December 2024 |

| GPT-4.1 mini | OpenAI | 16,000 | $0.4 | $1.6 | December 2024 |

| GPT-4.1 nano | OpenAI | 32,000 | $0.1 | $0.4 | December 2024 |

| Nemotron Ultra 253B | NVIDIA | - | - | - | - |

| OpenAI o4-mini | OpenAI | 100,000 | $1.1 | $4.4 | May 2024 |

| OpenAI o3 | OpenAI | 100,000 | $10 | $40 | May 2024 |

| Gemini 2.5 Flash | Google | 30,000 | $0.15 | $0.6 | May 2024 |

| Claude 4 Sonnet | Anthropic | 64,000 | $3 | $15 | Mar 2025 |

| Claude 4 Opus | Anthropic | 32,000 | $15 | $75 | Mar 2025 |

| Grok 4 | xAI | 16000 | - | - | - |

| GPT oss 120b | OpenAI | 131,072 | $0.15 | $0.6 | April 2025 |

| GPT oss 20b | OpenAI | 131,072 | $0.08 | $0.35 | April 2025 |

| Claude Opus 4.1 | Anthropic | 32,000 | $15 | $75 | April 2025 |

| GPT-5 | OpenAI | 128,000 | $1.25 | $10 | April 2025 |

| Claude Haiku 4.5 | Anthropic | - | $5 | $5 | - |

| GPT 5.1 | OpenAI | 128,000 | $1.25 | $10 | April 2025 |

| Kimi K2 Thinking | Kimi | 16,400 | $0.6 | $2.5 | April 2025 |

| Gemini 3 Pro | Google | 650000 | $2 | $12 | April 2025 |

| Claude Sonnet 4.5 | Anthropic | 160000 | $3 | $15 | April 2025 |

| Claude Opus 4.5 | Anthropic | 64,000 | $5 | $25 | April 2025 |

| GPT 5.2 | OpenAI | 16,000 | $1.5 | $14 | Aug 2025 |

| Claude Fable 5 | Anthropic | 128,000 | $10 | $50 | Jan 2026 |

| Claude Mythos 5 | Anthropic | 128,000 | $10 | $50 | Jan 2026 |

| Claude Opus 4.8 | Anthropic | 128,000 | $5 | $25 | Jan 2026 |

| Claude Sonnet 5 | Anthropic | 128,000 | $3 | $15 | Jan 2026 |

| DeepSeek V4 Flash | DeepSeek | 384000 | $0.14 | $0.28 | Jan 2026 |

| DeepSeek V4 Pro | DeepSeek | 384000 | $0.435 | $0.87 | Jan 2026 |

| Gemini 3.1 Pro | Google | 65,536 | $2 | $12 | Jan 2026 |

| Gemini 3.5 Flash | Google | 65,536 | $1.5 | $9 | Jan 2026 |

| GLM 5.2 | Z-AI | 128,000 | $0.95 | $3 | Mar 2026 |

| GPT-5.5 | OpenAI | 128,000 | $5 | $30 | Apr 2026 |

| GPT-5.5 Pro | OpenAI | 128,000 | $30 | $180 | Apr 2026 |

| Kimi K2.6 | Kimi | 256,000 | $0.95 | $4 | - |

| MiniMax M3 | MiniMax | 512,000 | $0.6 | $2.4 | Mar 2026 |
