Fastest LLM API 2026

Speed Rankings (Top 20 Active Models)

Sorted by tokens/second (descending). TTFT = time to first token in milliseconds. Click any model or provider to see full benchmark history.

#	Provider	Model	Tok/s	TTFT (ms)
1	groq	llama-3.1-8b	230	~0
2	groq	qwen3.6-27b	190	~0
3	google	Google: Nano Banana (Gemini 2.5 Flash Image)	161	830
4	groq	llama-3.3-70b	159	~0
5	bedrock	nova-micro	121	290
6	bedrock	llama-4-maverick	108	290
7	fireworks	GPT-oss-120b	105	~0
8	bedrock	llama-3.1-8b	96	360
9	bedrock	llama-4-scout	95	290
10	bedrock	nova-lite	91	330
11	bedrock	nova-pro	89	390
12	bedrock	llama-3.3-70b	87	350
13	openai	GPT-5 Nano	86	1810
14	bedrock	mistral-7b	82	190
15	openai	GPT-5.1-codex-mini	81	1150
16	bedrock	llama-3-8b	78	210
17	bedrock	mixtral-8x7b	74	230
18	google	gemini-2.5-flash-lite	73	560
19	fireworks	minimax-m2p7	73	~0
20	openai	GPT-5.1-codex	67	1090

Key Insights

Inference-optimized providers (Groq, DeepInfra, Fireworks) consistently deliver 100–300+ tokens/second — 3–10× faster than direct OpenAI/Anthropic APIs.
High throughput does not equal low latency: time-to-first-token (TTFT) is what matters for streaming UX. Groq and DeepInfra often have near-zero TTFT.
AWS Bedrock Nova Micro reaches ~118 tok/s with ~380ms TTFT — the fastest option if you need AWS-native data residency and compliance.
For most coding assistants and chatbots, 60–80 tok/s is imperceptibly fast in streaming mode. Chase lower TTFT before chasing higher throughput.
Benchmarks reflect rolling averages from automated runs. Provider speeds change week-to-week; check the live data before committing to a provider.

How to Choose

Need the absolute fastest response?

Use Groq or DeepInfra. Both consistently top the throughput charts with near-zero TTFT. Best for real-time voice, gaming, or low-latency chat.

Need speed + frontier model quality?

Fireworks runs many of the same open-weight models as Groq with competitive speeds. For proprietary frontier models, OpenAI GPT-5 Nano reaches ~91 tok/s with high quality.

Need AWS-native compliance?

AWS Bedrock Nova Micro hits ~118 tok/s — best-in-class for VPC-native, SOC2-compliant workloads within the AWS ecosystem.

Batch processing / cost-sensitive?

High throughput providers are also cheaper per token. Compare pricing on each provider page alongside speed.

Frequently Asked Questions

Groq and DeepInfra lead raw throughput benchmarks, regularly exceeding 150 tokens/second on mid-sized models. For the absolute highest speed on a capable model, Groq's Llama-3.3-70B at ~154 tok/s is a strong choice as of June 2026.

60–80 tok/s is generally imperceptible for streaming output. Above 100 tok/s is excellent. Below 30 tok/s starts to feel slow in interactive chat. For batch processing, throughput matters more than TTFT.

Yes — Groq's inference hardware consistently delivers 3–10× higher throughput than OpenAI's API for equivalent model sizes. The tradeoff is model selection: Groq offers fewer frontier models than OpenAI.

The underlying data refreshes automatically from live API calls. The table on this page reflects a snapshot from the build date. Visit the live /cloud page for real-time rankings.

TTFT is the delay between sending your API request and receiving the first token of the response. It determines how quickly a streaming response starts appearing. For interactive UX, TTFT under 500ms is preferred.

Methodology

All benchmarks are collected by automated scripts that send standardized prompts to each provider's production API and measure wall-clock time from request dispatch to final token. Tokens per second is calculated from the completion length and total generation time. TTFT is measured as the gap between request start and first streaming chunk. Runs are aggregated over a rolling window; the table above shows means from the most recent 7-day window. Providers are not notified before benchmark runs. View the API status page for current provider health.

Real benchmark rankings by tokens/second and time-to-first-token

Speed Rankings (Top 20 Active Models)

Key Insights

How to Choose

Need the absolute fastest response?

Need speed + frontier model quality?

Need AWS-native compliance?

Batch processing / cost-sensitive?

Frequently Asked Questions

Which LLM API is fastest in 2026?

What is a good tokens-per-second rate for a production API?

Is Groq faster than OpenAI?

How often do these benchmarks update?

What is time to first token (TTFT)?

Methodology