LLM Performance Leaderboard

Interactive comparison of large language models across multiple benchmarks

Filter by BenchmarkClick a benchmark to sort models

Rank	Model Name	Organization	License	arenaElo Score	Votes
1	Grok-3-Preview-02-24	xAI	Proprietary	1412	3,364
2	GPT-4.5-Preview	OpenAI	Proprietary	1411	3,242
3	Gemini-2.0-Flash-Thinking-Exp-01-21	Google	Proprietary	1384	17,487
4	Gemini-2.0-Pro-Exp-02-05	Google	Proprietary	1380	15,466
5	ChatGPT-4o-latest (2025-01-29)	OpenAI	Proprietary	1377	17,221
6	DeepSeek-R1	DeepSeek	MIT	1363	8,580
7	Gemini-2.0-Flash-001	Google	Proprietary	1357	13,257
8	o1-2024-12-17	OpenAI	Proprietary	1352	19,785
9	Qwen2.5-Max	Alibaba	Proprietary	1336	11,930
10	o3-mini-high	OpenAI	Proprietary	1329	9,102
11	DeepSeek-V3	DeepSeek	DeepSeek	1318	22,007
12	GLM-4-Plus-0111	Zhipu	Proprietary	1311	6,035
13	Qwen-Plus-0125	Alibaba	Proprietary	1310	6,054
14	Claude 3.7 Sonnet	Anthropic	Proprietary	1309	4,254
15	Gemini-2.0-Flash-Lite-Preview-02-05	Google	Proprietary	1308	12,774
16	Step-2-16K-Exp	StepFun	Proprietary	1305	5,132
17	o1-mini	OpenAI	Proprietary	1304	54,923
18	o3-mini	OpenAI	Proprietary	1304	15,463
19	Gemini-1.5-Pro-002	Google	Proprietary	1302	57,551
20	Grok-2-08-13	xAI	Proprietary	1288	67,038
21	Yi-Lightning	01.AI	Proprietary	1287	28,946
22	Claude 3.5 Sonnet (20241022)	Anthropic	Proprietary	1284	59,139
23	Deepseek-v2.5-1210	DeepSeek	DeepSeek	1279	7,247
24	Athene-v2-Chat-72B	Nexusflow	Athene V2	1275	26,092
25	GPT-4o-mini-2024-07-18	OpenAI	Proprietary	1272	66,710
26	Hunyuan-Large-2025-02-10	Tencent	Proprietary	1271	3,860
27	Gemini-1.5-Flash-002	Google	Proprietary	1271	36,979
28	Llama-3.1-405B-Instruct-bf16	Meta	Llama 3.1	1269	34,228

🚀 Real-time updates | 🔍 Interactive visualizations | 📊 Data-driven insights

Data aggregated from multiple benchmark sources • Last updated: March 2025