I run cron jobs to periodically test the token generation speed of different cloud LLM providers. The chart helps visualize the distributions of different speeds, as they can vary somewhat depending on the loads. For readability not all models are shown, but you can see the full results in the table below.
Every provider and model now has a dedicated landing page with narrative insights, SEO-friendly metadata, and structured data for search engines. Click any provider or model in the table to explore performance in depth.
I am working daily to add more providers and models, looking anywhere that does not require purchasing dedicated endpoints for hosting (why some models may appear to be missing). If you have any more suggestions let me know on GitHub!! 😊
| # | Model | Provider | Speed |
|---|---|---|---|
| 1 | llama-3.1-8b | groq | 297 tok/s |
| 2 | qwen-3-32b | groq | 249 tok/s |
| 3 | llama-3.1-8b | cerebras | 226 tok/s |
| 4 | llama-4-scout | groq | 210 tok/s |
| 5 | gpt-oss-120b | cerebras | 207 tok/s |
| Status | |||||||
|---|---|---|---|---|---|---|---|
| groq | llama-3.1-8b | Active | 57m ago | 297.00 | 87 | 471 | 120.00 |
| cerebras | qwen-3-32b | Active | 1d ago | 256.00 | 4 | 444 | 290.00 |
| groq | qwen-3-32b | Active | 57m ago | 249.00 | 46 | 391 | 140.00 |
| cerebras | llama-3.1-8b | Active | 58m ago | 226.00 | 6 | 365 | 480.00 |
| groq | llama-4-scout | Active | 57m ago | 210.00 | 23 | 331 | 220.00 |
| cerebras | gpt-oss-120b | Active | 58m ago | 207.00 | 4 | 346 | 720.00 |
| cerebras | llama-3.3-70b | Active | 1d ago | 207.00 | 17 | 338 | 360.00 |
| groq | llama-3.3-70b | Active | 57m ago | 206.00 | 94 | 322 | 120.00 |
| groq | llama-4-maverick | Active | 57m ago | 168.00 | 19 | 310 | 500.00 |
| cerebras | qwen-3-235b-instruct | Active | 14d ago | 160.00 | 2 | 264 | 1000.00 |
| together | llama-3.1-8b | Active | 55m ago | 153.00 | 2 | 232 | 540.00 |
| groq | kimi-k2 | Active | 57m ago | 144.00 | 21 | 201 | 220.00 |
| bedrock | nova-micro | Active | 38m ago | 126.00 | 69 | 151 | 260.00 |
| bedrock | llama-4-maverick | Active | 38m ago | 109.00 | 36 | 142 | 250.00 |
| openai | o3 Mini | Never Succeeded(Medium) | 57m ago | 106.00 | 21 | 159 | 0.00 |
| bedrock | llama-4-scout | Active | 38m ago | 102.00 | 1 | 140 | 350.00 |
| bedrock | nova-lite | Active | 38m ago | 101.00 | 39 | 132 | 290.00 |
| bedrock | llama-3.3-70b | Active | 38m ago | 101.00 | 15 | 137 | 240.00 |
| together | qwen-2.5-7b | Active | 56m ago | 92.00 | 3 | 146 | 300.00 |
| together | mistral-7b | Active | 55m ago | 91.00 | 2 | 165 | 480.00 |
| bedrock | nova-pro | Active | 38m ago | 88.40 | 10 | 124 | 370.00 |
| Active | 55m ago | 80.10 | 34 | 132 | 500.00 | ||
| openai | gpt-3.5-turbo | Active | 56m ago | 79.00 | 12 | 125 | 440.00 |
| openai | gpt-4.1-nano | Active | 57m ago | 73.10 | 27 | 131 | 380.00 |
| claude-3-haiku | Active | 29d ago | 72.00 | 63 | 76 | 460.00 | |
| together | llama-3.1-70b | Active | 55m ago | 71.60 | 7 | 144 | 400.00 |
| together | llama-3.2-3b | Active | 55m ago | 69.00 | 5 | 145 | 1220.00 |
| gemini-2.5-flash | Never Succeeded(Medium) | 55m ago | 65.40 | 5 | 100 | 1130.00 | |
| openai | gpt-4o | Active | 56m ago | 65.10 | 7 | 154 | 1380.00 |
| fireworks | mixtral-8x22b | Active | 57m ago | 63.80 | 37 | 112 | 530.00 |
| openai | GPT-5.1-codex-max | Active | 57m ago | 62.70 | 1 | 106 | 1840.00 |
| gemini-2.0-flash | Active | 13d ago | 60.40 | 15 | 88 | 600.00 | |
| gemini-2.0-flash-lite | Active | 13d ago | 58.40 | 11 | 80 | 690.00 | |
| deepinfra | mixtral-8x22b | Stale(Medium) | 58m ago | 55.40 | 14 | 80 | 330.00 |
| together | llama-3.3-70b | Active | 56m ago | 54.70 | 2 | 136 | 1450.00 |
| together | mixtral-8x7b | Active | 56m ago | 52.80 | 13 | 110 | 210.00 |
| together | qwen-2.5-72b | Active | 11d ago | 52.40 | 4 | 71 | 390.00 |
| deepinfra | mistral-7b | Stale(Medium) | 58m ago | 50.70 | 3 | 124 | 560.00 |
| openai | o4 Mini | Never Succeeded(Medium) | 57m ago | 50.00 | 15 | 76 | 0.00 |
| anthropic | claude-haiku-4.5 | Active | 58m ago | 49.50 | 15 | 80 | 670.00 |
| openai | gpt-4.1-mini | Active | 57m ago | 48.30 | 15 | 97 | 500.00 |
| bedrock | llama-3.2-90b | Active | 38m ago | 47.60 | 29 | 51 | 340.00 |
| deepinfra | llama-3.1-8b | Stale(Medium) | 58m ago | 47.00 | 1 | 102 | 1080.00 |
| deepinfra | devstral-small | Never Succeeded(Medium) | 58m ago | 44.90 | 3 | 131 | 550.00 |
| deepinfra | llama-3-8b | Stale(Medium) | 58m ago | 44.60 | 7 | 71 | 300.00 |
| fireworks | llama-3.3-70b | Active | 57m ago | 44.00 | 7 | 90 | 1190.00 |
| bedrock | mistral-large | Active | 38m ago | 43.70 | 7 | 47 | 250.00 |
| bedrock | claude-haiku-4.5 | Active | 38m ago | 43.60 | 4 | 63 | 1000.00 |
| gemini-2.5-pro | Never Succeeded(Medium) | 55m ago | 42.30 | 11 | 72 | 1530.00 | |
| openai | gpt-4o-mini | Active | 56m ago | 41.90 | 8 | 95 | 400.00 |
| together | deepseek-r1 | Active | 56m ago | 38.70 | 1 | 69 | 1860.00 |
| deepinfra | llama-3.2-90b | Stale(Medium) | 58m ago | 36.30 | 2 | 88 | 950.00 |
| claude-3-5-sonnet | Active | 29d ago | 34.30 | 33 | 35 | 680.00 | |
| openai | gpt-4.1 | Active | 57m ago | 34.00 | 6 | 70 | 470.00 |
| bedrock | claude-3-7-sonnet | Active | 39m ago | 33.60 | 6 | 44 | 730.00 |
| openai | gpt-4-turbo | Active | 56m ago | 32.90 | 2 | 51 | 570.00 |
| deepinfra | llama-3-70b | Stale(Medium) | 58m ago | 32.40 | 4 | 48 | 530.00 |
| bedrock | claude-3-5-sonnet | Active | 39m ago | 32.30 | 6 | 43 | 580.00 |
| deepinfra | llama-2-70b | Stale(Medium) | 58m ago | 32.30 | 3 | 49 | 430.00 |
| deepinfra | qwen-2.5-72b | Stale(Medium) | 58m ago | 32.10 | 2 | 47 | 700.00 |
| deepinfra | Qwen 2.5 Coder 32B | Never Succeeded(Medium) | 58m ago | 31.20 | 1 | 67 | 3460.00 |
| bedrock | claude-3-5-haiku | Active | 39m ago | 29.20 | 1 | 38 | 900.00 |
| openai | GPT-5.1 | Active | 57m ago | 29.20 | 3 | 56 | 1030.00 |
| deepinfra | llama-3.2-3b | Stale(Medium) | 58m ago | 27.70 | 1 | 90 | 730.00 |
| deepinfra | llama-3.2-1b | Stale(Medium) | 58m ago | 27.60 | 2 | 90 | 520.00 |
| openai | GPT-5.2 | Active | 57m ago | 27.00 | 9 | 42 | 980.00 |
| openai | gpt-4 | Active | 56m ago | 26.20 | 2 | 49 | 670.00 |
| openai | GPT-5.1-codex | Active | 57m ago | 25.70 | 3 | 49 | 1300.00 |
| deepinfra | llama-3.3-70b | Never Succeeded(Medium) | 58m ago | 22.50 | 1 | 59 | 1110.00 |
| together | deepseek-v3 | Active | 25d ago | 22.50 | 1 | 59 | 2610.00 |
| openai | GPT-5.1-codex-mini | Active | 57m ago | 21.80 | 1 | 55 | 1400.00 |
| bedrock | claude-sonnet-4.5 | Active | 38m ago | 21.70 | 2 | 29 | 1730.00 |
| together | llama-3.1-405b | Active | 11d ago | 21.60 | 2 | 30 | 1270.00 |
| deepinfra | llama-3.1-405b | Stale(Medium) | 58m ago | 20.60 | 1 | 34 | 1640.00 |
| deepinfra | llama-3.1-70b | Stale(Medium) | 58m ago | 20.60 | 3 | 46 | 620.00 |
| anthropic | claude-4-sonnet | Active | 58m ago | 19.60 | 1 | 30 | 1940.00 |
| anthropic | claude-opus-4.5 | Active | 58m ago | 19.50 | 7 | 27 | 1760.00 |
| bedrock | claude-3-opus | Active | 39m ago | 19.10 | 4 | 22 | 850.00 |
| deepinfra | qwen-3-235b | Never Succeeded(Medium) | 58m ago | 18.90 | 1 | 43 | 2390.00 |
| anthropic | Claude Opus 4.1 | Active | 58m ago | 18.50 | 5 | 25 | 1400.00 |
| anthropic | claude-4-opus | Active | 58m ago | 18.10 | 9 | 24 | 1210.00 |
| bedrock | claude-opus-4.5 | Active | 38m ago | 17.90 | 3 | 23 | 2120.00 |
| deepinfra | llama-3.2-11b | Stale(Medium) | 58m ago | 17.90 | 1 | 68 | 2050.00 |
| openai | Active | 57m ago | 12.10 | 2 | 22 | 1990.00 | |
| openai | o1-pro | Likely Deprecated(Medium) | 57m ago | 9.64 | 1 | 19 | 140.00 |
| openai | GPT-5.2-pro | Active | 57m ago | 6.58 | 1 | 13 | 6240.00 |