Provider Snapshot
81
35.05
0.00
Jun 10, 2026
Key Takeaways
81 deepinfra models are actively benchmarked with 5507 total measurements across 4706 benchmark runs.
qwen-3.5-2b leads the fleet with 194.00 tokens/second, while MiMo-V2.5 delivers 92.20 tok/s.
Performance varies by 110.4% across the deepinfra model lineup, indicating diverse optimization strategies for different use cases.
The deepinfra model fleet shows varied performance characteristics (85.2% variation coefficient), reflecting diverse model architectures.
Fastest Models
| Provider | Model | Avg Toks/Sec | Min | Max | Avg TTF (ms) |
|---|---|---|---|---|---|
| deepinfra | qwen-3.5-2b | 194.00 | 6.88 | 242.00 | 0.00 |
| deepinfra | Qwen3.6-35B-A3B | 117.00 | 70.80 | 160.00 | 0.00 |
| deepinfra | Nemotron-3-Nano-Omni-30B-A3B-Reasoning | 113.00 | 56.40 | 180.00 | 0.00 |
| deepinfra | qwen-3.5-0.8b | 106.00 | 24.70 | 229.00 | 0.00 |
| deepinfra | qwen-3.5-35b-a3b | 98.00 | 7.96 | 161.00 | 0.00 |
| deepinfra | MiMo-V2.5 | 92.20 | 42.20 | 138.00 | 0.00 |
All Models
Complete list of all deepinfra models tracked in the benchmark system. Click any model name to view detailed performance data.
| Provider | Model | Avg Toks/Sec | Min | Max | Avg TTF (ms) |
|---|---|---|---|---|---|
| deepinfra | Seed-2.0-code | 0.45 | 0.45 | 0.45 | 0.00 |
| deepinfra | Seed-2.0-pro | 2.10 | 0.93 | 3.18 | 0.00 |
| deepinfra | MiniMax-M2.5 | 14.20 | 4.76 | 27.00 | 0.00 |
| deepinfra | MiniMax-M2.7 | 11.40 | 4.82 | 24.30 | 0.00 |
| deepinfra | qwen-2.5-72b | 17.40 | 1.25 | 35.70 | 0.00 |
| deepinfra | Qwen 2.5 Coder 32B | 48.10 | 11.00 | 70.80 | 0.00 |
| deepinfra | qwen-3-14b | 32.70 | 1.43 | 59.10 | 0.00 |
| deepinfra | qwen-3-235b | 13.90 | 1.01 | 46.90 | 0.00 |
| deepinfra | Qwen3-235B-A22B-Thinking-2507 | 6.03 | 0.83 | 16.10 | 0.00 |
| deepinfra | Qwen3-30B-A3B | 49.50 | 18.60 | 70.60 | 0.00 |
| deepinfra | Qwen3-32B | 43.30 | 14.10 | 65.80 | 0.00 |
| deepinfra | Qwen3-Coder-480B-A35B-Instruct-Turbo | 21.10 | 2.39 | 57.40 | 0.00 |
| deepinfra | Qwen3-Max | 18.90 | 11.80 | 26.40 | 0.00 |
| deepinfra | Qwen3-Max-Thinking | 17.00 | 12.60 | 20.40 | 0.00 |
| deepinfra | Qwen3-Next-80B-A3B-Instruct | 48.00 | 2.65 | 86.30 | 0.00 |
| deepinfra | qwen-3.5-0.8b | 106.00 | 24.70 | 229.00 | 0.00 |
| deepinfra | qwen-3.5-122b-a10b | 73.80 | 22.70 | 115.00 | 0.00 |
| deepinfra | qwen-3.5-27b | 39.70 | 2.87 | 82.70 | 0.00 |
| deepinfra | qwen-3.5-2b | 194.00 | 6.88 | 242.00 | 0.00 |
| deepinfra | qwen-3.5-35b-a3b | 98.00 | 7.96 | 161.00 | 0.00 |
| deepinfra | qwen-3.5-397b-a17b | 56.10 | 7.86 | 102.00 | 0.00 |
| deepinfra | Qwen3.6-27B | 46.70 | 7.08 | 81.80 | 0.00 |
| deepinfra | Qwen3.6-35B-A3B | 117.00 | 70.80 | 160.00 | 0.00 |
| deepinfra | Qwen3.7-Max | 2.18 | 1.48 | 3.97 | 0.00 |
| deepinfra | MiMo-V2.5 | 92.20 | 42.20 | 138.00 | 0.00 |
| deepinfra | MiMo-V2.5-Pro | 38.20 | 2.86 | 69.40 | 0.00 |
| deepinfra | claude-haiku-4-5 | 29.30 | 22.00 | 34.10 | 0.00 |
| deepinfra | claude-opus-4-7 | 29.40 | 22.40 | 34.50 | 0.00 |
| deepinfra | claude-opus-4-8 | 26.80 | 20.40 | 30.10 | 0.00 |
| deepinfra | claude-sonnet-4-6 | 15.60 | 9.65 | 18.90 | 0.00 |
| deepinfra | DeepSeek-R1-0528 | 29.90 | 9.81 | 63.80 | 0.00 |
| deepinfra | DeepSeek-V3 | 17.90 | 11.20 | 30.60 | 0.00 |
| deepinfra | DeepSeek-V3.1 | 7.04 | 1.34 | 12.50 | 0.00 |
| deepinfra | DeepSeek-V3.1-Terminus | 20.40 | 9.46 | 44.80 | 0.00 |
| deepinfra | deepseek-v3.2 | 9.59 | 1.02 | 22.90 | 0.00 |
| deepinfra | DeepSeek-V4-Flash | 13.50 | 2.09 | 22.20 | 0.00 |
| deepinfra | DeepSeek-V4-Pro | 24.50 | 1.88 | 49.60 | 0.00 |
| deepinfra | gemini-2.5-flash | 35.10 | 9.10 | 45.70 | 0.00 |
| deepinfra | gemini-3.1-pro | 21.80 | 17.30 | 25.10 | 0.00 |
| deepinfra | gemma-3-12b-it | 24.80 | 11.80 | 37.80 | 0.00 |
| deepinfra | gemma-3-27b-it | 21.10 | 9.72 | 37.40 | 0.00 |
| deepinfra | gemma-3-4b-it | 37.00 | 4.50 | 56.40 | 0.00 |
| deepinfra | gemma-4-26B-A4B-it | 28.30 | 9.95 | 46.10 | 0.00 |
| deepinfra | gemma-4-31B-it | 12.50 | 5.07 | 27.10 | 0.00 |
| deepinfra | gemma-4-31B-it-turbo | 17.80 | 2.32 | 46.50 | 0.00 |
| deepinfra | llama-2-70b | 25.90 | 3.08 | 37.00 | 0.00 |
| deepinfra | llama-3.2-11b | 40.30 | 1.06 | 52.50 | 0.00 |
| deepinfra | llama-3.2-1b | 39.60 | 7.79 | 52.60 | 0.00 |
| deepinfra | llama-3.2-3b | 40.30 | 3.15 | 52.70 | 0.00 |
| deepinfra | llama-3.2-90b | 27.40 | 2.22 | 60.00 | 0.00 |
| deepinfra | llama-3.3-70b | 18.30 | 1.49 | 40.10 | 0.00 |
| deepinfra | Llama-3.3-70B-Instruct-Turbo | 14.70 | 1.59 | 27.00 | 0.00 |
| deepinfra | Llama-4-Maverick-17B-128E-Instruct-FP8 | 19.90 | 2.53 | 37.40 | 0.00 |
| deepinfra | Llama-Guard-4-12B | 3.31 | 2.09 | 4.21 | 0.00 |
| deepinfra | llama-3-70b | 27.50 | 3.03 | 37.30 | 0.00 |
| deepinfra | llama-3-8b | 40.40 | 15.70 | 75.60 | 0.00 |
| deepinfra | llama-3.1-405b | 16.60 | 2.12 | 27.40 | 0.00 |
| deepinfra | llama-3.1-70b | 26.50 | 1.08 | 65.90 | 0.00 |
| deepinfra | llama-3.1-8b | 29.50 | 8.79 | 66.30 | 0.00 |
| deepinfra | phi-4 | 55.80 | 53.10 | 58.50 | 0.00 |
| deepinfra | devstral-small | 35.80 | 1.77 | 67.10 | 0.00 |
| deepinfra | mistral-7b | 36.50 | 1.81 | 67.70 | 0.00 |
| deepinfra | Mistral-Nemo-Instruct-2407 | 25.20 | 9.18 | 40.30 | 0.00 |
| deepinfra | Mistral-Small-24B-Instruct-2501 | 44.10 | 6.64 | 56.60 | 0.00 |
| deepinfra | Mistral-Small-3.2-24B-Instruct-2506 | 35.30 | 11.20 | 66.00 | 0.00 |
| deepinfra | Kimi-K2.5 | 8.44 | 3.39 | 27.10 | 0.00 |
| deepinfra | Kimi-K2.6 | 20.80 | 2.26 | 62.80 | 0.00 |
| deepinfra | Llama-3.3-Nemotron-Super-49B-v1.5 | 38.50 | 4.94 | 50.70 | 0.00 |
| deepinfra | NVIDIA-Nemotron-3-Super-120B-A12B | 20.00 | 1.73 | 62.50 | 0.00 |
| deepinfra | NVIDIA-Nemotron-3-Ultra-550B-A55B | 46.40 | 11.50 | 73.50 | 0.00 |
| deepinfra | Nemotron-3-Nano-30B-A3B | 52.50 | 9.98 | 74.70 | 0.00 |
| deepinfra | Nemotron-3-Nano-Omni-30B-A3B-Reasoning | 113.00 | 56.40 | 180.00 | 0.00 |
| deepinfra | GPT-oss-120b | 18.60 | 7.51 | 38.60 | 0.00 |
| deepinfra | GPT-oss-120b-Turbo | 80.10 | 23.10 | 184.00 | 0.00 |
| deepinfra | GPT-oss-20b | 18.50 | 1.26 | 53.90 | 0.00 |
| deepinfra | Step-3.5-Flash | 27.40 | 18.70 | 35.60 | 0.00 |
| deepinfra | GLM-4.6 | 34.10 | 9.62 | 67.10 | 0.00 |
| deepinfra | GLM-4.7 | 29.10 | 5.27 | 47.50 | 0.00 |
| deepinfra | GLM-4.7-Flash | 50.60 | 8.91 | 81.30 | 0.00 |
| deepinfra | GLM-5 | 20.30 | 7.07 | 49.20 | 0.00 |
| deepinfra | GLM-5.1 | 27.90 | 5.75 | 47.10 | 0.00 |
Featured Models
Frequently Asked Questions
Based on recent tests, qwen-3.5-2b shows the highest average throughput among tracked deepinfra models.
This provider summary aggregates 5507 individual prompts measured across 4706 monitoring runs over the past month.