API Status

llama-3.1-405b Benchmarks

Provider: together

Explore real-world latency and throughput results for llama-3.1-405b. These measurements come from automated benchmarking runs against the provider APIs using the same harness that powers the public cloud dashboard.

Want a broader view of this vendor? Visit the together provider hub to compare every tracked model side-by-side.

Visit together Official Website

Benchmark Overview

Avg Tokens / Second

22.70

Avg Time to First Token (ms)

690.00

Runs Analysed

8

Last Updated

Feb 6, 2026, 06:04 PM

Key Insights

  • llama-3.1-405b streams at 22.70 tokens/second on average across the last 8 benchmark runs.

  • Performance fluctuated by 14.60 tokens/second (64.3% coefficient of variation), indicating variable behavior across benchmark runs.

  • Average time to first token is 690.00 ms (good latency), suitable for latency-sensitive workloads.

  • Latest measurements completed on Feb 6, 2026, 06:04 PM based on 8 total samples.

Performance Distribution

Distribution of throughput measurements showing performance consistency across benchmark runs.

Performance Over Time

Historical performance trends showing how throughput has changed over the benchmarking period.

llama-3.1-405b

Benchmark Samples

ProviderModelAvg Toks/SecMinMaxAvg TTF (ms)
togetherllama-3.1-405b22.7013.0027.60690.00

Frequently Asked Questions

The latest rolling average throughput is 22.70 tokens per second with an average time to first token of 690.00 ms across 8 recent runs.

Benchmarks refresh automatically whenever the monitoring cron runs. The most recent run completed on Feb 6, 2026, 06:04 PM.

Related Links