Cloud BenchmarksLocal Benchmarks

☁️ Cloud Benchmarks ☁️

I run cron jobs to periodically test the token generation speed of different cloud LLM providers. The chart helps visualize the distributions of different speeds, as they can vary somewhat depending on the loads. For readability not all models are shown, but you can see the full results in the table below.

Every provider and model now has a dedicated landing page with narrative insights, SEO-friendly metadata, and structured data for search engines. Click any provider or model in the table to explore performance in depth.

I am working daily to add more providers and models, looking anywhere that does not require purchasing dedicated endpoints for hosting (why some models may appear to be missing). If you have any more suggestions let me know on GitHub!! 😊

Fastest Models Right Now (updated <24h)

#ModelProviderSpeed
1llama-3.1-8bgroq282 tok/s
2qwen-3-32bcerebras254 tok/s
3qwen-3-32bgroq241 tok/s
4llama-3.1-8bcerebras227 tok/s
5llama-3.3-70bcerebras216 tok/s

📊 Speed Distribution 📊

📚 Full Results 📚

Showing 86 of 86 modelsFlagged statuses: likely_deprecated, deprecated, failing, stale, never_succeeded, disabled
Status
groqllama-3.1-8bActive2h ago282.0095447130.00
cerebrasqwen-3-32bActive2h ago254.004417360.00
groqqwen-3-32bActive2h ago241.0046391150.00
cerebrasllama-3.1-8bActive2h ago227.006365520.00
cerebrasllama-3.3-70bActive2h ago216.0016338360.00
cerebrasgpt-oss-120bActive2h ago214.004346760.00
groqllama-3.3-70bActive2h ago205.0079280120.00
groqllama-4-scoutActive2h ago196.0023316250.00
groqllama-4-maverickActive2h ago166.0019310510.00
cerebrasqwen-3-235b-instructActive5d ago163.002264810.00
togetherllama-3.1-8bActive2h ago155.002230580.00
groqkimi-k2Active2h ago140.0021203250.00
bedrocknova-microActive39m ago127.0069154260.00
bedrockllama-4-maverickActive39m ago109.0028149250.00
togethermistral-7bActive2h ago107.003166410.00
openaio3 MiniNever Succeeded(Medium)2h ago105.00201580.00
bedrocknova-liteActive39m ago102.0042135290.00
bedrockllama-4-scoutActive39m ago102.001140340.00
bedrockllama-3.3-70bActive39m ago101.0015137230.00
togetherqwen-2.5-7bActive2h ago99.103146280.00
bedrocknova-proActive39m ago83.1010124410.00
openaigpt-3.5-turboActive2h ago77.5012129460.00
googleActive2h ago77.4034113560.00
openaigpt-4.1-nanoActive2h ago76.3013138400.00
togetherllama-3.2-3bActive2h ago73.0051451130.00
togetherllama-3.1-70bActive2h ago73.004147480.00
googleclaude-3-haikuActive21d ago67.502779530.00
openaigpt-4oActive2h ago63.8071411410.00
fireworksmixtral-8x22bActive2h ago61.7037112570.00
googlegemini-2.0-flashActive5d ago61.10888600.00
googlegemini-2.0-flash-liteActive5d ago60.901181610.00
googlegemini-2.5-flashNever Succeeded(Medium)2h ago60.107781200.00
togetherllama-3.3-70bActive2h ago51.6021361600.00
anthropicclaude-haiku-4.5Active2h ago51.101580630.00
togetherqwen-2.5-72bActive2d ago50.80371500.00
deepinframixtral-8x22bStale(Medium)2h ago50.602680300.00
togethermixtral-8x7bActive2h ago49.807111290.00
openaio4 MiniNever Succeeded(Medium)2h ago49.7015750.00
bedrockllama-3.2-90bActive39m ago47.502951340.00
openaigpt-4.1-miniActive2h ago47.401897470.00
deepinfrallama-3-8bStale(Medium)2h ago45.40771300.00
bedrockclaude-haiku-4.5Active39m ago45.20865910.00
deepinframistral-7bStale(Medium)2h ago44.60380540.00
bedrockmistral-largeActive39m ago44.00847240.00
openaiGPT-5.1-codex-maxActive2h ago42.5011041540.00
googlegemini-2.5-proNever Succeeded(Medium)2h ago42.2015631530.00
fireworksllama-3.3-70bActive2h ago41.806791260.00
deepinfradevstral-smallNever Succeeded(Medium)2h ago41.50384510.00
openaigpt-4o-miniActive2h ago40.80895420.00
deepinfrallama-3.1-8bStale(Medium)2h ago38.4011021250.00
deepinfrallama-3.2-90bStale(Medium)2h ago38.30193910.00
togetherdeepseek-r1Active2h ago37.901681780.00
googleclaude-3-5-sonnetActive21d ago34.702446720.00
openaigpt-4.1Active2h ago34.50767450.00
openaigpt-4-turboActive2h ago33.80251560.00
bedrockclaude-3-7-sonnetActive39m ago33.00142790.00
deepinfrallama-2-70bStale(Medium)2h ago33.00844390.00
deepinfrallama-3-70bStale(Medium)2h ago32.70244530.00
bedrockclaude-3-5-sonnetActive39m ago32.60742570.00
deepinfraqwen-2.5-72bStale(Medium)2h ago31.80247870.00
deepinfraQwen 2.5 Coder 32BNever Succeeded(Medium)2h ago31.301674660.00
bedrockclaude-3-5-haikuActive39m ago28.901381270.00
togetherdeepseek-v3Active16d ago28.401671460.00
openaiGPT-5.1Active2h ago27.903561100.00
openaiGPT-5.2Active2h ago27.509441000.00
deepinfrallama-3.2-3bStale(Medium)2h ago26.90190650.00
deepinfrallama-3.2-1bStale(Medium)2h ago26.50483370.00
openaigpt-4Active2h ago26.10252600.00
openaiGPT-5.1-codexActive2h ago25.80449920.00
deepinfrallama-3.3-70bNever Succeeded(Medium)2h ago23.20159860.00
deepinfraqwen-3-235bNever Succeeded(Medium)2h ago22.10143530.00
bedrockclaude-sonnet-4.5Active39m ago21.601291810.00
togetherllama-3.1-405bActive3d ago21.401301440.00
openaiGPT-5.1-codex-miniActive2h ago21.201451070.00
deepinfrallama-3.1-70bStale(Medium)2h ago21.20339540.00
anthropicclaude-4-sonnetActive2h ago20.208311610.00
anthropicclaude-opus-4.5Active2h ago19.7011301710.00
bedrockclaude-3-opusActive39m ago19.00422840.00
anthropicClaude Opus 4.1Active2h ago18.905251330.00
anthropicclaude-4-opusActive2h ago18.104241200.00
bedrockclaude-opus-4.5Active39m ago18.001232230.00
deepinfrallama-3.1-405bStale(Medium)2h ago17.601312560.00
deepinfrallama-3.2-11bStale(Medium)2h ago14.901681940.00
openaiActive2h ago12.406182030.00
openaio1-proLikely Deprecated(Medium)2h ago9.83119100.00
openaiGPT-5.2-proActive2h ago4.661125680.00
Lifecycle snapshot
Loading status summary…

📈 Time Series 📈

llama-3.3-70b

llama-3.1-8b

claude-3-5-sonnet

claude-haiku-4.5

claude-opus-4.5

llama-3.1-405b

llama-3.1-70b

llama-3.2-3b

llama-3.2-90b

llama-4-maverick

llama-4-scout

mistral-7b

mixtral-8x22b

qwen-2.5-72b

qwen-3-32b

undefined

Claude Opus 4.1

claude-3-5-haiku

claude-3-7-sonnet

claude-3-opus

claude-4-opus

claude-4-sonnet

claude-sonnet-4.5

deepseek-r1

devstral-small

gemini-2.0-flash

gemini-2.0-flash-lite

gemini-2.5-flash

gemini-2.5-pro

gpt-3.5-turbo

gpt-4

gpt-4-turbo

gpt-4.1

gpt-4.1-mini

gpt-4.1-nano

gpt-4o

gpt-4o-mini

GPT-5.1

GPT-5.1-codex

GPT-5.1-codex-max

GPT-5.1-codex-mini

GPT-5.2

GPT-5.2-pro

gpt-oss-120b

kimi-k2

llama-2-70b

llama-3-70b

llama-3-8b

llama-3.2-11b

llama-3.2-1b

mistral-large

mixtral-8x7b

nova-lite

nova-micro

nova-pro

o1-pro

o3 Mini

o4 Mini

Qwen 2.5 Coder 32B

qwen-2.5-7b

qwen-3-235b

qwen-3-235b-instruct