LLM Performance Leaderboard
Interactive comparison of large language models across multiple benchmarks
Filter by BenchmarkClick a benchmark to sort models
Rank | Model Name | Organization | License | arenaElo Score | Votes |
---|---|---|---|---|---|
1 | Grok-3-Preview-02-24 | xAI | Proprietary | 1412 | 3,364 |
2 | GPT-4.5-Preview | OpenAI | Proprietary | 1411 | 3,242 |
3 | Gemini-2.0-Flash-Thinking-Exp-01-21 | Proprietary | 1384 | 17,487 | |
4 | Gemini-2.0-Pro-Exp-02-05 | Proprietary | 1380 | 15,466 | |
5 | ChatGPT-4o-latest (2025-01-29) | OpenAI | Proprietary | 1377 | 17,221 |
6 | DeepSeek-R1 | DeepSeek | MIT | 1363 | 8,580 |
7 | Gemini-2.0-Flash-001 | Proprietary | 1357 | 13,257 | |
8 | o1-2024-12-17 | OpenAI | Proprietary | 1352 | 19,785 |
9 | Qwen2.5-Max | Alibaba | Proprietary | 1336 | 11,930 |
10 | o3-mini-high | OpenAI | Proprietary | 1329 | 9,102 |
11 | DeepSeek-V3 | DeepSeek | DeepSeek | 1318 | 22,007 |
12 | GLM-4-Plus-0111 | Zhipu | Proprietary | 1311 | 6,035 |
13 | Qwen-Plus-0125 | Alibaba | Proprietary | 1310 | 6,054 |
14 | Claude 3.7 Sonnet | Anthropic | Proprietary | 1309 | 4,254 |
15 | Gemini-2.0-Flash-Lite-Preview-02-05 | Proprietary | 1308 | 12,774 | |
16 | Step-2-16K-Exp | StepFun | Proprietary | 1305 | 5,132 |
17 | o1-mini | OpenAI | Proprietary | 1304 | 54,923 |
18 | o3-mini | OpenAI | Proprietary | 1304 | 15,463 |
19 | Gemini-1.5-Pro-002 | Proprietary | 1302 | 57,551 | |
20 | Grok-2-08-13 | xAI | Proprietary | 1288 | 67,038 |
21 | Yi-Lightning | 01.AI | Proprietary | 1287 | 28,946 |
22 | Claude 3.5 Sonnet (20241022) | Anthropic | Proprietary | 1284 | 59,139 |
23 | Deepseek-v2.5-1210 | DeepSeek | DeepSeek | 1279 | 7,247 |
24 | Athene-v2-Chat-72B | Nexusflow | Athene V2 | 1275 | 26,092 |
25 | GPT-4o-mini-2024-07-18 | OpenAI | Proprietary | 1272 | 66,710 |
26 | Hunyuan-Large-2025-02-10 | Tencent | Proprietary | 1271 | 3,860 |
27 | Gemini-1.5-Flash-002 | Proprietary | 1271 | 36,979 | |
28 | Llama-3.1-405B-Instruct-bf16 | Meta | Llama 3.1 | 1269 | 34,228 |
🚀 Real-time updates | 🔍 Interactive visualizations | 📊 Data-driven insights
Data aggregated from multiple benchmark sources • Last updated: March 2025