3 leaderboards, 54 model rankings. Who's winning the AI race right now.
Human preference rankings from blind A/B comparisons (5.7M+ votes)
| # | Model | Elo |
|---|---|---|
| 🥇 1 | claude-opus-4-7-thinking Anthropic |
1503 |
| 🥈 2 | claude-opus-4-6-thinking Anthropic |
1503 |
| 🥉 3 | claude-opus-4-6 Anthropic |
1496 |
| 4 | claude-opus-4-7 Anthropic |
1494 |
| 5 | gemini-3.1-pro-preview |
1493 |
| 6 | muse-spark Meta |
1492 |
| 7 | gemini-3-pro |
1486 |
| 8 | grok-4.20-beta1 xAI |
1482 |
| 9 | gpt-5.4-high OpenAI |
1481 |
| 10 | grok-4.20-beta-0309-reasoning xAI |
1479 |
| 11 | gpt-5.2-chat-latest-20260210 OpenAI |
1476 |
| 12 | grok-4.20-multi-agent-beta-0309 xAI |
1476 |
| 13 | gemini-3-flash |
1474 |
| 14 | claude-opus-4-5-20251101-thinking-32k Anthropic |
1473 |
| 15 | glm-5.1 Z.ai |
1470 |
Contamination-free benchmark with monthly-refreshed questions
| # | Model | Score |
|---|---|---|
| 🥇 1 | GPT-5.5 Thinking xHigh Effort OpenAI |
80.71 |
| 🥈 2 | GPT-5.4 Thinking xHigh Effort OpenAI |
80.28 |
| 🥉 3 | Gemini 3.1 Pro Preview High* |
79.93 |
| 4 | Claude 4.7 Opus Thinking xHigh Effort Anthropic |
76.91 |
| 5 | Claude 4.6 Opus Thinking High Effort Anthropic |
76.33 |
| 6 | Claude 4.5 Opus Thinking High Effort Anthropic |
75.96 |
| 7 | Claude 4.6 Sonnet Thinking Medium Effort Anthropic |
75.47 |
| 8 | GPT-5.2 High OpenAI |
74.84 |
| 9 | GPT-5.2 Codex OpenAI |
74.3 |
| 10 | GPT-5.1 Codex Max High OpenAI |
73.98 |
| 11 | Gemini 3 Pro Preview High |
73.39 |
| 12 | GPT-5.3 Codex High OpenAI |
72.76 |
| 13 | Gemini 3 Flash Preview High |
72.4 |
| 14 | Kimi K2.6 Thinking Moonshot AI |
72.17 |
| 15 | GPT-5.1 High OpenAI |
72.04 |
Real-world software engineering task completion
| # | Model | Score |
|---|---|---|
| 🥇 1 | Claude 4.5 Opus (high reasoning) Anthropic |
76.8 |
| 🥈 2 | Gemini 3 Flash (high reasoning) |
75.8 |
| 🥉 3 | MiniMax M2.5 (high reasoning) Minimax |
75.8 |
| 4 | Claude Opus 4.6 Anthropic |
75.6 |
| 5 | GPT-5-2 Codex OpenAI |
72.8 |
| 6 | GLM-5 (high reasoning) Z-AI |
72.8 |
| 7 | GPT-5-2 (high reasoning) OpenAI |
72.8 |
| 8 | GPT 5.2 Codex OpenAI |
72.8 |
| 9 | Claude 4.5 Sonnet (high reasoning) Anthropic |
71.4 |
| 10 | Kimi K2.5 (high reasoning) KimiAI |
70.8 |
| 11 | DeepSeek V3.2 (high reasoning) DeepSeek |
70 |
| 12 | Gemini 3 Pro |
69.6 |
| 13 | Claude 4.5 Haiku (high reasoning) Anthropic |
66.6 |
| 14 | GPT-5 Mini OpenAI |
56.2 |