βοΈ Cloud Benchmarks βοΈ
I run cron jobs to periodically test the token generation speed of different cloud LLM providers. The chart helps visualize the distributions of different speeds, as they can vary somewhat depending on the loads. For readability not all models are shown, but you can see the full results in the table below.
Every provider and model now has a dedicated landing page with narrative insights, SEO-friendly metadata, and structured data for search engines. Click any provider or model in the table to explore performance in depth.
I am working daily to add more providers and models, looking anywhere that does not require purchasing dedicated endpoints for hosting (why some models may appear to be missing). If you have any more suggestions let me know on GitHub!! π
Pick A Path In 10 Seconds
Quick recommendations from the latest 7-day benchmark slice. Use one path, jump into full results, then drill into provider/model pages.
Fastest Models Right Now (updated <24h)
| # | Model | Provider | Speed |
|---|---|---|---|
| 1 | llama-3.1-8b | groq | 310 tok/s |
| 2 | qwen-3-32b | groq | 217 tok/s |
| 3 | llama-3.3-70b | groq | 195 tok/s |
| 4 | llama-4-scout | groq | 193 tok/s |
| 5 | llama-3.1-8b | cerebras | 171 tok/s |
π Speed Distribution π
π Full Results π
| Status | |||||||
|---|---|---|---|---|---|---|---|
| cerebras | qwen-3-32b | Active | 29d ago | 366.00 | 366 | 366 | 140.00 |
| groq | llama-3.1-8b | Active | 56m ago | 310.00 | 87 | 471 | 100.00 |
| groq | qwen-3-32b | Active | 56m ago | 217.00 | 2 | 374 | 310.00 |
| groq | llama-4-maverick | Active | 8d ago | 203.00 | 1 | 307 | 680.00 |
| groq | llama-3.3-70b | Active | 56m ago | 195.00 | 68 | 340 | 140.00 |
| groq | llama-4-scout | Active | 56m ago | 193.00 | 38 | 335 | 250.00 |
| cerebras | gpt-oss-120b | Active | 5d ago | 184.00 | 1 | 380 | 1170.00 |
| cerebras | llama-3.1-8b | Active | 57m ago | 171.00 | 1 | 353 | 1320.00 |
| together | llama-3.1-8b | Active | 3d ago | 141.00 | 3 | 228 | 350.00 |
| groq | kimi-k2 | Active | 56m ago | 138.00 | 12 | 215 | 320.00 |
| bedrock | nova-micro | Active | 35m ago | 121.00 | 65 | 152 | 270.00 |
| openai | o3 Mini | Never Succeeded(Medium) | 55m ago | 109.00 | 8 | 164 | 0.00 |
| bedrock | llama-4-maverick | Active | 35m ago | 108.00 | 3 | 139 | 270.00 |
| bedrock | llama-4-scout | Active | 35m ago | 101.00 | 6 | 130 | 280.00 |
| bedrock | nova-lite | Active | 35m ago | 100.00 | 22 | 132 | 300.00 |
| cerebras | llama-3.3-70b | Active | 29d ago | 97.80 | 97 | 98 | 620.00 |
| bedrock | llama-3.3-70b | Active | 35m ago | 96.50 | 3 | 136 | 300.00 |
| together | qwen-2.5-7b | Active | 54m ago | 92.60 | 1 | 145 | 500.00 |
| bedrock | nova-pro | Active | 35m ago | 86.10 | 19 | 121 | 370.00 |
| openai | GPT-5.1-codex-max | Active | 56m ago | 81.70 | 11 | 118 | 1220.00 |
| deepinfra | mistral-7b | Stale(Medium) | 57m ago | 79.00 | 5 | 148 | 610.00 |
| openai | gpt-3.5-turbo | Active | 54m ago | 75.40 | 13 | 126 | 510.00 |
| deepinfra | devstral-small | Never Succeeded(Medium) | 57m ago | 74.30 | 9 | 140 | 580.00 |
| together | llama-3.1-70b | Active | 20d ago | 73.80 | 15 | 129 | 340.00 |
| gemini-2.5-flash-lite | Active | 54m ago | 72.10 | 10 | 117 | 550.00 | |
| openai | gpt-4.1-nano | Active | 55m ago | 70.30 | 9 | 149 | 480.00 |
| together | mistral-7b | Active | 20d ago | 70.20 | 6 | 91 | 380.00 |
| fireworks | mixtral-8x22b | Active | 56m ago | 69.90 | 29 | 111 | 400.00 |
| openai | gpt-4o | Active | 54m ago | 68.40 | 14 | 173 | 1320.00 |
| gemini-2.5-flash | Never Succeeded(Medium) | 54m ago | 66.00 | 5 | 105 | 990.00 | |
| together | mixtral-8x7b | Active | 54m ago | 60.80 | 14 | 114 | 170.00 |
| deepinfra | mixtral-8x22b | Stale(Medium) | 27d ago | 55.00 | 14 | 66 | 580.00 |
| together | llama-3.2-3b | Active | 11d ago | 54.90 | 5 | 121 | 1510.00 |
| fireworks | llama-3.3-70b | Active | 56m ago | 54.50 | 1 | 108 | 1670.00 |
| together | deepseek-r1 | Active | 54m ago | 53.90 | 1 | 113 | 740.00 |
| together | llama-3.3-70b | Active | 54m ago | 51.70 | 1 | 146 | 1240.00 |
| openai | gpt-4.1-mini | Active | 55m ago | 51.50 | 15 | 109 | 440.00 |
| anthropic | claude-haiku-4.5 | Active | 57m ago | 51.40 | 19 | 73 | 550.00 |
| openai | o4 Mini | Never Succeeded(Medium) | 55m ago | 49.00 | 4 | 76 | 0.00 |
| bedrock | llama-3.2-90b | Active | 35m ago | 46.70 | 2 | 51 | 370.00 |
| deepinfra | llama-3-8b | Stale(Medium) | 56m ago | 45.00 | 18 | 69 | 320.00 |
| deepinfra | llama-3.1-8b | Stale(Medium) | 56m ago | 44.30 | 3 | 85 | 690.00 |
| openai | gpt-4.1 | Active | 55m ago | 40.80 | 10 | 83 | 510.00 |
| bedrock | mistral-large | Active | 35m ago | 40.70 | 2 | 47 | 560.00 |
| gemini-2.5-pro | Never Succeeded(Medium) | 54m ago | 40.50 | 2 | 72 | 1700.00 | |
| deepinfra | llama-3.2-1b | Stale(Medium) | 56m ago | 40.30 | 1 | 100 | 860.00 |
| openai | gpt-4o-mini | Active | 54m ago | 39.70 | 7 | 64 | 390.00 |
| deepinfra | llama-3.2-3b | Stale(Medium) | 56m ago | 39.40 | 2 | 99 | 830.00 |
| bedrock | claude-haiku-4.5 | Active | 36m ago | 39.30 | 3 | 62 | 1200.00 |
| deepinfra | llama-3.2-90b | Stale(Medium) | 57m ago | 34.80 | 3 | 82 | 760.00 |
| deepinfra | llama-2-70b | Stale(Medium) | 56m ago | 34.60 | 3 | 57 | 600.00 |
| deepinfra | llama-3-70b | Stale(Medium) | 56m ago | 33.90 | 2 | 55 | 650.00 |
| bedrock | claude-3-5-sonnet | Active | 36m ago | 32.70 | 2 | 46 | 650.00 |
| deepinfra | qwen-2.5-72b | Stale(Medium) | 57m ago | 32.50 | 1 | 50 | 800.00 |
| bedrock | claude-3-7-sonnet | Active | 36m ago | 32.20 | 2 | 42 | 760.00 |
| openai | gpt-4-turbo | Active | 55m ago | 32.20 | 7 | 52 | 520.00 |
| bedrock | claude-3-5-haiku | Active | 36m ago | 31.70 | 9 | 38 | 650.00 |
| deepinfra | Qwen 2.5 Coder 32B | Never Succeeded(Medium) | 57m ago | 31.40 | 1 | 82 | 3530.00 |
| openai | GPT-5.1 | Active | 55m ago | 29.50 | 2 | 57 | 1100.00 |
| openai | gpt-4 | Active | 54m ago | 27.30 | 8 | 47 | 630.00 |
| openai | GPT-5.4 | Active | 56m ago | 27.20 | 15 | 36 | 1050.00 |
| openai | GPT-5.2 | Active | 56m ago | 27.10 | 4 | 40 | 970.00 |
| openai | GPT-5.1-codex | Active | 55m ago | 26.40 | 1 | 48 | 1240.00 |
| openai | GPT-5.1-codex-mini | Active | 55m ago | 25.30 | 1 | 52 | 1190.00 |
| deepinfra | llama-3.1-405b | Stale(Medium) | 56m ago | 25.10 | 1 | 39 | 790.00 |
| deepinfra | llama-3.1-70b | Stale(Medium) | 56m ago | 23.00 | 1 | 44 | 1100.00 |
| bedrock | claude-sonnet-4.5 | Active | 35m ago | 21.90 | 1 | 28 | 1700.00 |
| openai | GPT-5.3-codex | Active | 56m ago | 21.30 | 7 | 32 | 1230.00 |
| anthropic | claude-opus-4.5 | Active | 57m ago | 20.50 | 2 | 33 | 1810.00 |
| bedrock | claude-3-opus | Active | 26d ago | 19.60 | 8 | 22 | 860.00 |
| anthropic | claude-4-sonnet | Active | 57m ago | 19.60 | 6 | 31 | 1870.00 |
| bedrock | claude-opus-4.5 | Active | 35m ago | 19.10 | 1 | 27 | 1990.00 |
| anthropic | Claude Opus 4.1 | Active | 57m ago | 17.60 | 7 | 27 | 1550.00 |
| deepinfra | llama-3.3-70b | Never Succeeded(Medium) | 57m ago | 17.60 | 1 | 46 | 2590.00 |
| anthropic | claude-4-opus | Active | 57m ago | 17.40 | 5 | 22 | 1320.00 |
| openai | gpt-5.2-codex | Active | 56m ago | 13.60 | 1 | 27 | 1720.00 |
| openai | o1-pro | Likely Deprecated(Medium) | 55m ago | 9.57 | 1 | 18 | 670.00 |
| openai | GPT-5.2-pro | Active | 3h ago | 8.61 | 4 | 14 | 4770.00 |
| deepinfra | qwen-3-235b | Never Succeeded(Medium) | 57m ago | 8.38 | 1 | 53 | 6190.00 |
| deepinfra | llama-3.2-11b | Stale(Medium) | 56m ago | 8.14 | 1 | 61 | 2650.00 |