Leaderboard | Koda Intelligence

// 01

The Consensus Board

The cross-board verdict. For each board a model family earns a family percentile (1 minus its canonical position), and the composite is the mean percentile across every board it appears on. Shown for models ranked on two or more boards. koda-leaderboard-composite-v1 · as of AUG 02, 2026 01:58 UTC.

#	Model	Composite	LMArena	LiveBench	SWE-Bench Verified	Artificial Analysis
1	claude-fable-5	98.6	#1	#1	·	#2
2	Claude Opus 5 (max)	92.2	#6	#4	·	#1
3	claude-opus-4-7-thinking	81.9	#3	#9	·	·
4	GPT-5.6 Sol Max Effort	81.5	#14	#2	·	#3
5	GPT-5.6 Terra (max)	81.0	·	#7	·	#5
6	Kimi K3 (max)	79.5	#12	#5	·	#4
7	claude-opus-4-6-thinking	73.0	#2	#16	#4	·
8	Gemini 3.1 Pro Preview High	72.7	#11	#8	·	·
9	GPT-5.5 Thinking xHigh Effort	70.5	#16	#3	·	·
10	Grok 4.5 (high)	70.2	·	#12	·	#6

// 02

The Instruments

Every board, read straight from source. Bars scale to each board’s top score, so length is proportional and the exact figure sits in ink at the end.

LMArenaGeneral

AnthropicMetaGoogleMoonshot AI

1

claude-fable-5

1509

2

claude-opus-4-6-thinking

1505

3

claude-opus-4-7-thinking

1502

4

claude-opus-4-6

1497

5

claude-opus-4-7▲1

1492

6

claude-opus-5-highNEW

1492

7

claude-opus-5-maxNEW

1490

8

muse-spark-1.1▼3

1490

9

muse-spark▼2

1488

10

gemini-3-pro▼1

1486

11

gemini-3.1-pro-preview▼3

1485

12

kimi-k3-maxNEW

1485

SOURCE: arena.ai · SCRAPED AUG 02, 2026

LiveBenchReasoning

AnthropicOpenAIMoonshot AIGooglexAI

1

Claude Fable 5 Max Effort▲1

83

2

GPT-5.6 Sol Max Effort▼1

81

3

GPT-5.5 Thinking xHigh Effort▲1

80.2

4

Claude 5 Opus Thinking Max EffortNEW

80.1

5

Kimi K3▲2

79.2

6

GPT-5.4 Thinking xHigh Effort▲2

78

7

GPT-5.6 Terra Max Effort▼2

77.9

8

Gemini 3.1 Pro Preview High▲1

77

9

Claude 4.7 Opus Thinking xHigh Effort▲1

76.5

10

Claude 4.8 Opus Thinking Max EffortNEW

76.2

11

Claude Sonnet 5 xHigh Effort▲2

76

12

Grok 4.5▼1

75.8

SOURCE: livebench.ai · SCRAPED AUG 02, 2026

SWE-Bench VerifiedCoding

AnthropicGoogleMiniMaxOpenAIZ.AIMoonshot AIDeepSeek

1

Claude 4.5 Opus (high reasoning)

76.8

2

Gemini 3 Flash (high reasoning)

75.8

3

MiniMax M2.5 (high reasoning)

75.8

4

Claude Opus 4.6

75.6

5

GPT-5-2 Codex

72.8

6

GLM-5 (high reasoning)

72.8

7

GPT-5-2 (high reasoning)

72.8

8

Claude 4.5 Sonnet (high reasoning)▲1

71.4

9

Kimi K2.5 (high reasoning)▲1

70.8

10

DeepSeek V3.2 (high reasoning)▲1

70

11

Gemini 3 Pro▲1

69.6

12

Claude 4.5 Haiku (high reasoning)▲1

66.6

SOURCE: swebench.com · SCRAPED AUG 02, 2026

Artificial AnalysisIntelligence

AnthropicOpenAIMoonshot AIxAIZ.AIMetaGoogleDeepSeek

1

Claude Opus 5 (max)

61

2

Claude Fable 5 (with fallback)

60

3

GPT-5.6 Sol (max)

59

4

Kimi K3 (max)NEW

57

5

GPT-5.6 Terra (max)▲1

55

6

Grok 4.5 (high)▲1

54

7

Claude Sonnet 5 (max)▲1

53

8

GPT-5.6 Luna (max)▲1

51

9

GLM-5.2 (max)▲1

51

10

Muse Spark 1.1 (xhigh)▲1

51

11

Gemini 3.6 Flash▲1

50

12

DeepSeek V4 Flash 0731 (max)NEW

50

SOURCE: artificialanalysis.ai · SCRAPED AUG 02, 2026

// 03

Provider Dominance

Share of every top-20 canonical family slot across all 4 boards, by provider. One bar, 73 slots, the whole field at a glance.

25%

16%

21%

OpenAI 25% (18)Anthropic 25% (18)Google 16% (12)Meta 5% (4)xAI 4% (3)DeepSeek 4% (3)Other 21% (15)

// 04

Head to Head

Pick any two canonical model families. We line them up on every board they share and show the exact configuration used. Boards where one is missing are skipped, not faked.

Model A

VS

Model B

// 05

Methodology

What each board measures

Composite koda-leaderboard-composite-v1 · as of AUG 02, 2026 01:58 UTC. Every figure is pulled from the source below and verified before it ships.

LMArenaarena.ai ↗SCRAPED AUG 02, 2026

Human preference rankings from blind A/B comparisons (7M+ votes)

LiveBenchlivebench.ai ↗SCRAPED AUG 02, 2026

Contamination-free benchmark with monthly-refreshed questions

SWE-Bench Verifiedswebench.com ↗SCRAPED AUG 02, 2026

Real-world software engineering task completion (Verified, 500 tasks)

Artificial Analysisartificialanalysis.ai ↗SCRAPED AUG 02, 2026

Artificial Analysis Intelligence Index -- aggregate of 9 evaluations

Different boards measure different things; the consensus view averages canonical-family percentiles, not scores. Each family contributes once per board using its best-ranked configuration.

Who is actually winning

The Consensus Board

The Instruments

Provider Dominance

Head to Head

Methodology

What each board measures

Like what you see?