openai LLM Benchmarks – Performance & Latency

Provider Snapshot

Models Tracked

Median Tokens / Second

48.16

Median Time to First Token (ms)

0.54

Last Updated

Dec 26, 2025

10 openai models are actively benchmarked with 26358 total measurements across 25295 benchmark runs.
gpt-4.1-nano leads the fleet with 81.00 tokens/second, while GPT-5.1 delivers 39.10 tok/s.
Performance varies by 107.2% across the openai model lineup, indicating diverse optimization strategies for different use cases.
Median time to first token across the fleet is 0.54 ms, showing excellent responsiveness for interactive applications.
The openai model fleet shows varied performance characteristics (35.9% variation coefficient), reflecting diverse model architectures.

Provider	Model	Avg Toks/Sec	Min	Max	Avg TTF (ms)
openai	gpt-4.1-nano	81.00	9.20	169.00	0.36
openai	gpt-3.5-turbo	75.50	1.36	137.00	0.53
openai	gpt-4.1-mini	56.90	9.62	129.00	0.33
openai	gpt-4o	53.40	3.69	128.00	0.58
openai	gpt-4.1	41.50	6.91	94.60	0.38
openai	GPT-5.1	39.10	7.31	76.80	0.77

Complete list of all openai models tracked in the benchmark system. Click any model name to view detailed performance data.

Provider	Model	Avg Toks/Sec	Min	Max	Avg TTF (ms)
openai	gpt-4	25.60	4.71	49.20	0.83
openai	gpt-4o	53.40	3.69	128.00	0.58
openai	gpt-4o-mini	36.90	3.80	71.70	0.58
openai	gpt-3.5-turbo	75.50	1.36	137.00	0.53
openai	gpt-4-turbo	34.90	4.03	54.10	0.53
openai	gpt-4.1	41.50	6.91	94.60	0.38
openai	gpt-4.1-mini	56.90	9.62	129.00	0.33
openai	gpt-4.1-nano	81.00	9.20	169.00	0.36
openai	GPT-5.1	39.10	7.31	76.80	0.77
openai	GPT-5.2	36.80	12.90	58.90	0.53