βοΈ Cloud Benchmarks βοΈ
I run cron jobs to periodically test the token generation speed of different cloud LLM providers. The chart helps visualize the distributions of different speeds, as they can vary somewhat depending on the loads. For readability not all models are shown, but you can see the full results in the table below.
Every provider and model now has a dedicated landing page with narrative insights, SEO-friendly metadata, and structured data for search engines. Click any provider or model in the table to explore performance in depth.
I am working daily to add more providers and models, looking anywhere that does not require purchasing dedicated endpoints for hosting (why some models may appear to be missing). If you have any more suggestions let me know on GitHub!! π
Pick A Path In 10 Seconds
Quick recommendations from the latest 7-day benchmark slice. Use one path, jump into full results, then drill into provider/model pages.
Fastest Models Right Now (updated <24h)
| # | Model | Provider | Speed |
|---|---|---|---|
| 1 | llama-3.1-8b | groq | 296 tok/s |
| 2 | qwen-3-32b | groq | 205 tok/s |
| 3 | llama-4-scout | groq | 185 tok/s |
| 4 | llama-3.3-70b | groq | 183 tok/s |
| 5 | llama-3.1-8b | cerebras | 169 tok/s |
π Speed Distribution π
π Full Results π
| Status | |||||||
|---|---|---|---|---|---|---|---|
| groq | llama-3.1-8b | Active | 25m ago | 296.00 | 130 | 450 | 100.00 |
| groq | llama-4-maverick | Active | 16d ago | 213.00 | 1 | 307 | 850.00 |
| groq | qwen-3-32b | Active | 25m ago | 205.00 | 15 | 287 | 200.00 |
| groq | llama-4-scout | Active | 25m ago | 185.00 | 51 | 333 | 260.00 |
| groq | llama-3.3-70b | Active | 25m ago | 183.00 | 40 | 340 | 160.00 |
| cerebras | gpt-oss-120b | Active | 13d ago | 180.00 | 1 | 380 | 1320.00 |
| cerebras | llama-3.1-8b | Active | 29m ago | 169.00 | 1 | 353 | 1370.00 |
| together | llama-3.1-8b | Active | 11d ago | 144.00 | 3 | 228 | 380.00 |
| groq | kimi-k2 | Active | 25m ago | 127.00 | 12 | 215 | 390.00 |
| bedrock | nova-micro | Active | 21m ago | 123.00 | 65 | 152 | 260.00 |
| openai | o3 Mini | Never Succeeded(Medium) | 22m ago | 109.00 | 8 | 169 | 0.00 |
| openai | o3-mini-2025-01-31 | Active | 23m ago | 108.00 | 15 | 160 | 0.00 |
| bedrock | llama-4-maverick | Active | 21m ago | 105.00 | 1 | 145 | 520.00 |
| bedrock | nova-lite | Active | 21m ago | 100.00 | 20 | 132 | 300.00 |
| bedrock | llama-4-scout | Active | 21m ago | 99.30 | 3 | 130 | 300.00 |
| bedrock | llama-3.3-70b | Active | 21m ago | 95.80 | 3 | 134 | 310.00 |
| openai | GPT-5.4-nano | Active | 23m ago | 92.10 | 43 | 129 | 410.00 |
| together | qwen-2.5-7b | Active | 21m ago | 90.50 | 1 | 145 | 530.00 |
| deepinfra | mistral-7b | Stale(Medium) | 25m ago | 88.00 | 5 | 148 | 540.00 |
| openai | GPT-5.4-nano-2026-03-17 | Active | 23m ago | 86.90 | 36 | 125 | 490.00 |
| openai | o1 | Active | 23m ago | 86.40 | 21 | 147 | 0.00 |
| openai | GPT-5.1-codex-max | Active | 22m ago | 85.30 | 14 | 118 | 1150.00 |
| deepinfra | devstral-small | Never Succeeded(Medium) | 3h ago | 82.00 | 9 | 140 | 540.00 |
| bedrock | nova-pro | Active | 21m ago | 81.00 | 19 | 121 | 380.00 |
| together | llama-3.1-70b | Active | 28d ago | 80.20 | 15 | 116 | 310.00 |
| openai | GPT-5.4-mini | Active | 23m ago | 78.50 | 16 | 111 | 530.00 |
| openai | GPT-5.4-mini-2026-03-17 | Active | 23m ago | 75.70 | 9 | 119 | 590.00 |
| together | mistral-7b | Active | 28d ago | 74.00 | 35 | 89 | 230.00 |
| gemini-2.5-flash-lite | Active | 21m ago | 73.70 | 10 | 117 | 540.00 | |
| fireworks | mixtral-8x22b | Active | 25m ago | 73.60 | 29 | 111 | 340.00 |
| openai | gpt-3.5-turbo | Active | 22m ago | 73.60 | 4 | 126 | 530.00 |
| openai | gpt-4.1-nano | Active | 22m ago | 70.70 | 18 | 149 | 450.00 |
| gemini-2.5-flash | Never Succeeded(Medium) | 21m ago | 64.30 | 6 | 105 | 1040.00 | |
| openai | gpt-4o | Active | 21m ago | 63.40 | 8 | 142 | 1580.00 |
| together | mixtral-8x7b | Active | 21m ago | 60.10 | 8 | 114 | 200.00 |
| together | deepseek-r1 | Active | 21m ago | 57.20 | 1 | 113 | 740.00 |
| fireworks | llama-3.3-70b | Active | 25m ago | 54.70 | 1 | 108 | 1600.00 |
| openai | o4-mini-2025-04-16 | Active | 23m ago | 52.50 | 28 | 74 | 0.00 |
| openai | gpt-4.1-mini | Active | 22m ago | 51.80 | 15 | 109 | 430.00 |
| together | llama-3.3-70b | Active | 21m ago | 51.80 | 1 | 121 | 1190.00 |
| openai | GPT-5-chat-latest | Active | 24m ago | 51.60 | 13 | 82 | 540.00 |
| together | llama-3.2-3b | Active | 19d ago | 51.10 | 5 | 121 | 1690.00 |
| anthropic | claude-haiku-4.5 | Active | 30m ago | 49.80 | 3 | 73 | 630.00 |
| openai | o4 Mini | Never Succeeded(Medium) | 22m ago | 49.60 | 4 | 77 | 0.00 |
| bedrock | llama-3.2-90b | Active | 21m ago | 46.60 | 2 | 50 | 380.00 |
| deepinfra | llama-3-8b | Stale(Medium) | 25m ago | 45.10 | 18 | 69 | 320.00 |
| deepinfra | llama-3.1-8b | Stale(Medium) | 25m ago | 41.20 | 3 | 78 | 670.00 |
| openai | gpt-4.1 | Active | 22m ago | 41.20 | 15 | 83 | 540.00 |
| bedrock | mistral-large | Active | 21m ago | 40.70 | 2 | 47 | 540.00 |
| openai | gpt-4o-mini | Active | 22m ago | 39.70 | 7 | 64 | 400.00 |
| bedrock | claude-haiku-4.5 | Active | 21m ago | 39.60 | 3 | 62 | 1150.00 |
| gemini-2.5-pro | Never Succeeded(Medium) | 21m ago | 39.60 | 2 | 72 | 1720.00 | |
| deepinfra | llama-3.2-1b | Stale(Medium) | 25m ago | 39.50 | 3 | 100 | 790.00 |
| openai | o3-2025-04-16 | Active | 23m ago | 38.40 | 13 | 68 | 0.00 |
| deepinfra | llama-3.2-3b | Stale(Medium) | 25m ago | 38.30 | 2 | 99 | 850.00 |
| deepinfra | Qwen 2.5 Coder 32B | Never Succeeded(Medium) | 29m ago | 38.00 | 1 | 84 | 3510.00 |
| openai | GPT-5.1-2025-11-13 | Active | 24m ago | 36.80 | 12 | 62 | 760.00 |
| openai | o3 | Active | 23m ago | 35.80 | 12 | 63 | 0.00 |
| deepinfra | llama-3.2-90b | Stale(Medium) | 25m ago | 34.50 | 4 | 82 | 850.00 |
| deepinfra | llama-2-70b | Stale(Medium) | 25m ago | 33.70 | 3 | 57 | 620.00 |
| deepinfra | llama-3-70b | Stale(Medium) | 25m ago | 33.20 | 2 | 55 | 640.00 |
| openai | gpt-4-turbo | Active | 3h ago | 32.50 | 1 | 52 | 520.00 |
| openai | GPT-5.1 | Active | 22m ago | 32.20 | 2 | 64 | 1040.00 |
| bedrock | claude-3-5-haiku | Active | 21m ago | 32.10 | 5 | 38 | 650.00 |
| bedrock | claude-3-5-sonnet | Active | 21m ago | 32.10 | 1 | 46 | 770.00 |
| bedrock | claude-3-7-sonnet | Active | 21m ago | 31.80 | 2 | 42 | 780.00 |
| deepinfra | qwen-2.5-72b | Stale(Medium) | 3h ago | 31.10 | 1 | 46 | 1160.00 |
| openai | GPT-5.4-2026-03-05 | Active | 23m ago | 30.40 | 18 | 42 | 680.00 |
| openai | GPT-5.2-2025-12-11 | Active | 24m ago | 30.00 | 18 | 40 | 660.00 |
| openai | GPT-5.1-chat-latest | Active | 24m ago | 29.60 | 13 | 47 | 920.00 |
| openai | GPT-5.4 | Active | 22m ago | 29.50 | 15 | 41 | 810.00 |
| openai | GPT-5.2 | Active | 22m ago | 28.00 | 4 | 47 | 930.00 |
| openai | GPT-5.1-codex | Active | 22m ago | 26.80 | 1 | 52 | 1250.00 |
| openai | gpt-4 | Active | 21m ago | 26.60 | 4 | 47 | 640.00 |
| openai | GPT-5.1-codex-mini | Active | 6h ago | 25.60 | 1 | 50 | 1170.00 |
| openai | GPT-5.3-codex | Active | 23m ago | 23.90 | 7 | 37 | 950.00 |
| deepinfra | llama-3.1-405b | Stale(Medium) | 25m ago | 23.00 | 1 | 39 | 980.00 |
| deepinfra | llama-3.1-70b | Stale(Medium) | 25m ago | 23.00 | 1 | 42 | 1080.00 |
| bedrock | claude-sonnet-4.5 | Active | 21m ago | 21.30 | 1 | 28 | 1760.00 |
| anthropic | claude-opus-4.5 | Active | 30m ago | 20.50 | 2 | 33 | 1800.00 |
| bedrock | claude-opus-4.5 | Active | 21m ago | 19.40 | 1 | 27 | 2110.00 |
| anthropic | claude-4-sonnet | Active | 30m ago | 19.30 | 6 | 32 | 1920.00 |
| anthropic | Claude Opus 4.1 | Active | 30m ago | 17.60 | 7 | 27 | 1510.00 |
| deepinfra | llama-3.3-70b | Never Succeeded(Medium) | 29m ago | 17.60 | 1 | 43 | 2760.00 |
| anthropic | claude-4-opus | Active | 29m ago | 17.20 | 5 | 22 | 1350.00 |
| openai | gpt-5.2-codex | Active | 9h ago | 15.30 | 1 | 35 | 1610.00 |
| deepinfra | llama-3.2-11b | Stale(Medium) | 25m ago | 14.40 | 1 | 61 | 1930.00 |
| openai | GPT-5.2-chat-latest | Active | 24m ago | 10.80 | 1 | 24 | 1580.00 |
| openai | o1-pro | Likely Deprecated(Medium) | 22m ago | 9.91 | 1 | 18 | 430.00 |
| openai | GPT-5.2-pro | Active | 22m ago | 8.84 | 4 | 14 | 4770.00 |
| openai | GPT-5-codex | Active | 24m ago | 8.57 | 1 | 17 | 1750.00 |
| deepinfra | qwen-3-235b | Never Succeeded(Medium) | 29m ago | 7.22 | 1 | 53 | 5230.00 |
| openai | o3-pro | Active | 23m ago | 6.63 | 1 | 13 | 490.00 |
| openai | o3-pro-2025-06-10 | Active | 23m ago | 6.36 | 2 | 11 | 950.00 |
| openai | GPT-5-pro | Active | 24m ago | 3.77 | 1 | 6 | 0.00 |
| openai | GPT-5.2-pro-2025-12-11 | Active | 3h ago | 1.91 | 1 | 4 | 8020.00 |