cerebras LLM Benchmarks – Performance & Latency

Provider Snapshot

Models Tracked

Median Tokens / Second

221.00

Median Time to First Token (ms)

0.61

Last Updated

Dec 26, 2025

5 cerebras models are actively benchmarked with 4860 total measurements across 2598 benchmark runs.
qwen-3-32b leads the fleet with 266.00 tokens/second, while qwen-3-235b-instruct delivers 160.00 tok/s.
Performance varies by 66.3% across the cerebras model lineup, indicating diverse optimization strategies for different use cases.
Median time to first token across the fleet is 0.61 ms, showing excellent responsiveness for interactive applications.
The cerebras model fleet shows consistent performance characteristics (16.8% variation coefficient), indicating standardized infrastructure.

Provider	Model	Avg Toks/Sec	Min	Max	Avg TTF (ms)
cerebras	qwen-3-32b	266.00	1.25	373.00	0.52
cerebras	gpt-oss-120b	244.00	4.92	363.00	0.45
cerebras	llama-3.3-70b	235.00	4.33	340.00	0.35
cerebras	llama-3.1-8b	200.00	3.26	374.00	0.68
cerebras	qwen-3-235b-instruct	160.00	1.67	275.00	1.04

Complete list of all cerebras models tracked in the benchmark system. Click any model name to view detailed performance data.

Provider	Model	Avg Toks/Sec	Min	Max	Avg TTF (ms)
cerebras	llama-3.1-8b	200.00	3.26	374.00	0.68
cerebras	llama-3.3-70b	235.00	4.33	340.00	0.35
cerebras	gpt-oss-120b	244.00	4.92	363.00	0.45
cerebras	qwen-3-32b	266.00	1.25	373.00	0.52
cerebras	qwen-3-235b-instruct	160.00	1.67	275.00	1.04