I run cron jobs to periodically test the token generation speed of different cloud LLM providers. The chart helps visualize the distributions of different speeds, as they can vary somewhat depending on the loads. For readability not all models are shown, but you can see the full results in the table below.
Every provider and model now has a dedicated landing page with narrative insights, SEO-friendly metadata, and structured data for search engines. Click any provider or model in the table to explore performance in depth.
I am working daily to add more providers and models, looking anywhere that does not require purchasing dedicated endpoints for hosting (why some models may appear to be missing). If you have any more suggestions let me know on GitHub!! 😊
| # | Model | Provider | Speed |
|---|---|---|---|
| 1 | llama-3.1-8b | groq | 316 tok/s |
| 2 | qwen-3-32b | groq | 248 tok/s |
| 3 | llama-4-scout | groq | 215 tok/s |
| 4 | llama-3.1-8b | cerebras | 212 tok/s |
| 5 | llama-3.3-70b | groq | 206 tok/s |
| Status | |||||||
|---|---|---|---|---|---|---|---|
| groq | llama-3.1-8b | Active | 47m ago | 316.00 | 87 | 471 | 100.00 |
| cerebras | qwen-3-32b | Active | 9d ago | 249.00 | 4 | 444 | 330.00 |
| groq | qwen-3-32b | Active | 47m ago | 248.00 | 2 | 391 | 270.00 |
| groq | llama-4-scout | Active | 47m ago | 215.00 | 23 | 335 | 200.00 |
| cerebras | llama-3.1-8b | Active | 50m ago | 212.00 | 6 | 350 | 490.00 |
| groq | llama-3.3-70b | Active | 47m ago | 206.00 | 79 | 322 | 120.00 |
| cerebras | gpt-oss-120b | Active | 50m ago | 197.00 | 4 | 346 | 840.00 |
| cerebras | llama-3.3-70b | Active | 9d ago | 194.00 | 17 | 338 | 400.00 |
| groq | llama-4-maverick | Active | 47m ago | 165.00 | 25 | 310 | 500.00 |
| together | llama-3.1-8b | Active | 45m ago | 151.00 | 2 | 232 | 470.00 |
| groq | kimi-k2 | Active | 47m ago | 146.00 | 19 | 212 | 230.00 |
| cerebras | qwen-3-235b-instruct | Active | 22d ago | 138.00 | 5 | 252 | 1180.00 |
| bedrock | nova-micro | Active | 33m ago | 123.00 | 69 | 151 | 270.00 |
| bedrock | llama-4-maverick | Active | 33m ago | 109.00 | 3 | 142 | 260.00 |
| openai | o3 Mini | Never Succeeded(Medium) | 47m ago | 109.00 | 21 | 160 | 0.00 |
| bedrock | llama-4-scout | Active | 33m ago | 102.00 | 4 | 132 | 280.00 |
| bedrock | llama-3.3-70b | Active | 33m ago | 101.00 | 11 | 137 | 240.00 |
| bedrock | nova-lite | Active | 33m ago | 100.00 | 39 | 131 | 300.00 |
| together | qwen-2.5-7b | Active | 45m ago | 92.50 | 3 | 145 | 300.00 |
| bedrock | nova-pro | Active | 33m ago | 91.30 | 13 | 124 | 360.00 |
| openai | gpt-3.5-turbo | Active | 46m ago | 80.00 | 12 | 124 | 440.00 |
| Active | 45m ago | 75.90 | 34 | 132 | 520.00 | ||
| together | mistral-7b | Active | 45m ago | 75.40 | 2 | 157 | 520.00 |
| together | llama-3.1-70b | Active | 45m ago | 72.50 | 7 | 144 | 390.00 |
| openai | GPT-5.1-codex-max | Active | 47m ago | 72.20 | 1 | 118 | 1650.00 |
| openai | gpt-4.1-nano | Active | 47m ago | 69.60 | 9 | 126 | 420.00 |
| openai | gpt-4o | Active | 46m ago | 67.50 | 7 | 173 | 1340.00 |
| gemini-2.5-flash | Never Succeeded(Medium) | 45m ago | 66.80 | 5 | 101 | 1100.00 | |
| fireworks | mixtral-8x22b | Active | 47m ago | 66.00 | 35 | 112 | 480.00 |
| together | llama-3.2-3b | Active | 45m ago | 63.50 | 5 | 121 | 1320.00 |
| gemini-2.0-flash | Active | 21d ago | 60.10 | 16 | 88 | 590.00 | |
| deepinfra | mixtral-8x22b | Stale(Medium) | 7d ago | 57.30 | 14 | 79 | 360.00 |
| deepinfra | mistral-7b | Stale(Medium) | 49m ago | 56.10 | 5 | 136 | 580.00 |
| gemini-2.0-flash-lite | Active | 21d ago | 56.10 | 11 | 73 | 790.00 | |
| together | mixtral-8x7b | Active | 45m ago | 55.70 | 13 | 114 | 180.00 |
| deepinfra | llama-3.1-8b | Stale(Medium) | 47m ago | 55.20 | 7 | 102 | 400.00 |
| together | qwen-2.5-72b | Active | 19d ago | 53.60 | 4 | 71 | 400.00 |
| together | llama-3.3-70b | Active | 46m ago | 53.40 | 2 | 146 | 1360.00 |
| deepinfra | devstral-small | Never Succeeded(Medium) | 50m ago | 50.30 | 3 | 131 | 600.00 |
| openai | gpt-4.1-mini | Active | 47m ago | 49.80 | 15 | 97 | 490.00 |
| openai | o4 Mini | Never Succeeded(Medium) | 47m ago | 49.60 | 15 | 76 | 0.00 |
| anthropic | claude-haiku-4.5 | Active | 50m ago | 49.20 | 16 | 74 | 650.00 |
| bedrock | llama-3.2-90b | Active | 33m ago | 47.40 | 6 | 51 | 340.00 |
| fireworks | llama-3.3-70b | Active | 47m ago | 46.20 | 4 | 94 | 1280.00 |
| deepinfra | llama-3-8b | Stale(Medium) | 47m ago | 44.80 | 7 | 71 | 300.00 |
| bedrock | mistral-large | Active | 33m ago | 43.00 | 3 | 47 | 300.00 |
| openai | gpt-4o-mini | Active | 46m ago | 42.60 | 8 | 95 | 400.00 |
| bedrock | claude-haiku-4.5 | Active | 34m ago | 42.50 | 4 | 63 | 1060.00 |
| together | deepseek-r1 | Active | 45m ago | 42.20 | 1 | 69 | 1520.00 |
| gemini-2.5-pro | Never Succeeded(Medium) | 45m ago | 42.00 | 11 | 72 | 1530.00 | |
| deepinfra | llama-3.2-90b | Stale(Medium) | 49m ago | 36.80 | 3 | 88 | 770.00 |
| openai | gpt-4.1 | Active | 47m ago | 36.60 | 6 | 82 | 470.00 |
| deepinfra | llama-3.2-3b | Stale(Medium) | 48m ago | 33.80 | 1 | 90 | 780.00 |
| deepinfra | llama-2-70b | Stale(Medium) | 47m ago | 33.60 | 3 | 57 | 410.00 |
| bedrock | claude-3-7-sonnet | Active | 34m ago | 33.50 | 6 | 44 | 730.00 |
| deepinfra | llama-3.2-1b | Stale(Medium) | 48m ago | 33.50 | 1 | 90 | 630.00 |
| deepinfra | llama-3-70b | Stale(Medium) | 47m ago | 33.30 | 4 | 53 | 540.00 |
| deepinfra | Qwen 2.5 Coder 32B | Never Succeeded(Medium) | 50m ago | 33.20 | 1 | 67 | 2240.00 |
| openai | gpt-4-turbo | Active | 46m ago | 32.50 | 2 | 52 | 570.00 |
| deepinfra | qwen-2.5-72b | Stale(Medium) | 49m ago | 32.30 | 2 | 50 | 740.00 |
| bedrock | claude-3-5-sonnet | Active | 34m ago | 31.60 | 2 | 44 | 650.00 |
| openai | GPT-5.1 | Active | 47m ago | 30.20 | 4 | 56 | 940.00 |
| bedrock | claude-3-5-haiku | Active | 34m ago | 29.50 | 1 | 38 | 870.00 |
| openai | GPT-5.2 | Active | 47m ago | 26.80 | 9 | 40 | 940.00 |
| openai | gpt-4 | Active | 46m ago | 26.50 | 2 | 47 | 670.00 |
| openai | GPT-5.1-codex | Active | 47m ago | 25.60 | 3 | 49 | 1310.00 |
| openai | GPT-5.1-codex-mini | Active | 47m ago | 23.80 | 1 | 55 | 1330.00 |
| bedrock | claude-sonnet-4.5 | Active | 33m ago | 22.40 | 2 | 29 | 1630.00 |
| deepinfra | llama-3.1-405b | Stale(Medium) | 48m ago | 22.40 | 1 | 34 | 1210.00 |
| together | llama-3.1-405b | Active | 19d ago | 21.80 | 2 | 29 | 1110.00 |
| deepinfra | llama-3.1-70b | Stale(Medium) | 48m ago | 21.30 | 3 | 46 | 670.00 |
| deepinfra | llama-3.3-70b | Never Succeeded(Medium) | 50m ago | 21.10 | 1 | 49 | 1770.00 |
| anthropic | claude-opus-4.5 | Active | 50m ago | 19.60 | 6 | 30 | 1790.00 |
| anthropic | claude-4-sonnet | Active | 50m ago | 19.60 | 1 | 30 | 2000.00 |
| bedrock | claude-3-opus | Active | 6d ago | 19.00 | 4 | 22 | 860.00 |
| bedrock | claude-opus-4.5 | Active | 34m ago | 18.20 | 4 | 23 | 2010.00 |
| anthropic | Claude Opus 4.1 | Active | 50m ago | 18.10 | 8 | 25 | 1450.00 |
| anthropic | claude-4-opus | Active | 50m ago | 17.80 | 9 | 24 | 1250.00 |
| deepinfra | llama-3.2-11b | Stale(Medium) | 49m ago | 17.20 | 1 | 62 | 2540.00 |
| deepinfra | qwen-3-235b | Never Succeeded(Medium) | 49m ago | 15.50 | 1 | 38 | 3970.00 |
| openai | Active | 47m ago | 12.70 | 2 | 22 | 1850.00 | |
| openai | o1-pro | Likely Deprecated(Medium) | 47m ago | 8.93 | 1 | 19 | 580.00 |
| openai | GPT-5.2-pro | Active | 47m ago | 7.65 | 1 | 13 | 5580.00 |