K
KairosRoute
Blog/The KairosRoute LLM Cost Index, Q2 2026
Benchmark11 min readKairosRoute

The KairosRoute LLM Cost Index, Q2 2026

Every quarter we sit down with the routing logs and try to answer one question: what did a million tokens of useful LLM work actually cost in the last 90 days, and where did that number come from? This is the first public edition of what we're calling the KairosRoute LLM Cost Index — a quarterly read on provider pricing, task-adjusted costs, and the pace at which tokens are getting cheaper.

Headline stats

Four numbers tell most of the story for Q2 2026.

text
Q2 2026 KairosRoute LLM Cost Index — headline figures

Blended median cost per 1M tokens (all models):         $1.83
  — down from $2.41 in Q1 2026    (-24% QoQ)
  — down from $4.12 in Q2 2025    (-56% YoY)

Task-adjusted median (cheapest model that clears
the kr-auto quality bar per task):                      $0.31
  — down from $0.47 in Q1 2026    (-34% QoQ)

Frontier-tier median ($/1M output tokens,
top 6 models by quality score):                        $11.40
  — down from $12.90 in Q1 2026   (-12% QoQ)

Token deflation rate (rolling 12-month, blended):       -42%

Tokens are getting cheaper faster than almost any input in the history of software. A 42% annual decline is not a trend you can plan career-length spending against. It is the reason every AI-first company we talk to is re-forecasting their COGS every 90 days.

Median cost per 1M tokens, by provider

Below is the median blended cost per million tokens seen on our routing fabric in Q2 2026, across whichever models of theirs we actually dispatched traffic to. Providers with under ~10M tokens of quarterly volume on our platform are excluded as the sample size is too small to be meaningful.

text
Provider            Blended $/1M    Share of KR traffic    QoQ change
─────────────────────────────────────────────────────────────────────
Google (Gemini)           $0.92              23%              -31%
DeepSeek                  $0.18              15%              -28%
Groq                      $0.27              11%              -19%
Anthropic                 $3.41              14%              -11%
OpenAI                    $2.88              12%              -14%
Together AI               $0.41               8%              -22%
Mistral                   $0.71               6%              -17%
Fireworks                 $0.38               5%              -20%
xAI                       $2.10               4%              -10%
Cohere                    $0.68               2%               -8%

A few things to note. The blended number mixes cheap small models with pricey frontier ones, so Anthropic looking "expensive" vs. DeepSeek is mostly a reflection of mix — most of our Anthropic traffic is Opus and Sonnet, most of our DeepSeek traffic is Chat and Coder at roughly a twentieth of that per-token rate. Provider rankings are not quality rankings.

The QoQ column is where the story gets loud. Google and DeepSeek dropped effective rates by nearly a third in 90 days. OpenAI and Anthropic softened at half that pace — reflecting a more mature pricing posture at the frontier, where competition is slower to bite. Cheap got cheaper faster than smart got cheaper.

Median cost per 1M tokens, by model tier

We bucket models into four internal tiers for routing: frontier (best-quality, highest-cost), balanced (workhorse mid-tier), fast (lower-quality, latency-optimized), and specialist (code, vision, embedding). Here's the quarter.

text
Tier         Examples                                Median $/1M    QoQ
────────────────────────────────────────────────────────────────────────
Frontier     GPT-5, Opus 5, Gemini 3 Ultra              $11.40    -12%
Balanced     Sonnet 4.7, GPT-5 mini, Gemini 3 Pro        $2.10    -21%
Fast         Haiku 4, Gemini 3 Flash, Llama 4 70B        $0.38    -29%
Specialist   DeepSeek-Coder, Qwen3-VL, Voyage-3          $0.41    -18%

The balanced tier is where most real production traffic now lives. A year ago, "I'll just use GPT-4" was a defensible default. Today, the balanced tier delivers ~92% of frontier quality on the task categories we measure at roughly 18% of the cost. That arbitrage is why kr-auto sends ~71% of its routing decisions to balanced-tier models.

Median cost by task type

Cost-per-token is a vendor pricing number. Cost-per-useful-task is what your CFO actually cares about. Below is the median cost of completing a task in each of the categories kr-auto tracks — where "completing" means reaching the quality bar enforced for that category. Numbers shown are for the cheapest model that cleared the bar, averaged across the quarter.

text
Task type              Median $/task    Dominant model         % of KR volume
────────────────────────────────────────────────────────────────────────────
Simple extraction          $0.00021     Haiku 4                      28%
Classification             $0.00014     Gemini 3 Flash               19%
Summarization              $0.00083     Sonnet 4.7 / Haiku 4         17%
Conversational             $0.00061     Gemini 3 Flash / Haiku 4     14%
Structured generation      $0.00240     Sonnet 4.7                   13%
Complex reasoning          $0.01820     Opus 5 / GPT-5 / Gemini 3 U   9%

The interesting row is the last one. Complex reasoning is ~100x more expensive per task than classification, and if you're not consciously isolating it, a small amount of hard-reasoning traffic can eat a huge share of your bill. Every team we onboard sees their first cost-per-task chart and has the same reaction: most of our budget went to tasks that didn't need to be that expensive.

Cost compression within task types

The quarter-over-quarter drop is even steeper when you normalize by task difficulty. Take classification — a year ago the median was $0.00071/task because most teams defaulted to GPT-4-class models for safety. Today's median of $0.00014 is an 80% compression at the task level, well beyond the 24% raw token price drop. Routing and distillation do a lot of work that raw pricing sheets hide.

The token deflation rate

We're introducing a number we'll track every quarter going forward: the token deflation rate, or TDR. It's the trailing 12-month change in the blended median cost of 1M tokens on our fabric, task-mix-weighted to a fixed basket from one year prior.

Q2 2026 TDR: -42%.

Translation: a task basket that cost $1.00 in Q2 2025 costs about 58 cents today. Applied to operating budgets, that is a genuinely new category of financial dynamic. If your pricing model does not assume tokens will be 40% cheaper next year, you're planning for the wrong world.

Where the money actually went

Aggregate cost share by provider on our fabric for Q2 2026. Cost share, not volume share — a heavy Opus customer can spend more on Anthropic in dollars while sending fewer tokens there.

text
Provider          % of Q2 spend       % of Q2 tokens
─────────────────────────────────────────────────────
Anthropic              31%                  14%
OpenAI                 24%                  12%
Google                 19%                  23%
xAI                     8%                   4%
DeepSeek                4%                  15%
Groq                    4%                  11%
Together AI             3%                   8%
Mistral                 3%                   6%
Fireworks               2%                   5%
Other                   2%                   2%

Anthropic and OpenAI absorb 55% of aggregate dollar spend despite processing just 26% of tokens. That gap is the frontier premium. It is also why routing matters — sending the right tokens to the right tier is the single biggest lever on LLM COGS in 2026.

Three predictions we're willing to timestamp

1. Frontier pricing will decouple from token costs by end of Q4 2026. Anthropic, OpenAI, and Google are already experimenting with outcome-based, compute-reservation, and tier-subscription pricing models. The "$X per million tokens" unit is too coarse to survive another year at the frontier.

2. The balanced tier will consolidate to 3 winners. There are currently eleven credible mid-tier models on our fabric. By Q2 2027 we expect three of them to hold 80% of balanced-tier traffic. Routing data already shows concentration accelerating.

3. Specialist models will crack 10% of total inference spend. Today they're around 6%. Cheaper, task-narrow models — coder, vision, embedding, reranker — are the fastest-growing tier on our fabric, and we don't see the slope flattening.

What this means if you're building

  • Re-forecast every quarter. Any cost model more than 90 days old is lying to you. Tokens are moving too fast.
  • Classify before you route. Blended cost averages hide the 100x spread between simple extraction and complex reasoning. If you can't see that spread, you can't manage it.
  • Don't price on today's COGS. Price on a 12-month forward view of COGS. Your customers will, even if they can't articulate it.
  • Bet on the balanced tier. It's where the quality-to-cost slope is steepest right now, and it's where kr-auto sends the majority of our routed traffic.

Methodology & caveats

Numbers in this index are derived from KairosRoute's routing telemetry — requests that crossed our gateway between Jan 1 and Mar 31, 2026. This data has real advantages over scraping published rate cards: it captures actual mix, actual model choices, and actual task outcomes. It also has real limits.

  • Sample bias. KairosRoute customers skew toward cost-sensitive, routing-curious teams. Our blended medians are almost certainly lower than the true market median because our customers route more than average teams do.
  • Model mix bias. We report blended figures, but provider-level blends depend on which of that provider's models our customers use. A provider's "blended $/1M" can shift substantially as customers change model preferences, independent of pricing.
  • Task classification is imperfect. Our task-type medians depend on the classifier we use to bucket requests. Classifier accuracy is in the mid-90s on labeled sets — good, not perfect. Edge cases bleed between categories.
  • Quality bar is our bar. Task-adjusted costs use the quality threshold kr-auto enforces for that task type. A team with stricter quality needs would see higher task-adjusted costs; a team with looser needs, lower. The number is a reference point, not a universal figure.
  • Volume coverage. Roughly 1.2B billed tokens in the quarter. Large in absolute terms; still a rounding error next to the public token economy. Trends we see here are indicative, not definitive.

We'll publish this index quarterly. Over time, the numbers get more useful — both because our sample grows and because the QoQ series becomes long enough to tell real stories instead of anecdotes. If you want the raw methodology document or want to sanity-check a claim against your own data, reach out.

If you'd rather stop guessing at your own costs and start measuring them, try the playground or read Agent Observability is the New APM.

Ready to route smarter?

KairosRoute gives you a single OpenAI-compatible endpoint that routes every request to the cheapest model meeting your quality bar — plus the observability, A/B testing, and cost analytics that turn cheaper infrastructure into a durable margin.

Related Reading

Provider Latency Leaderboard — April 2026 Update

p50/p95/p99 time-to-first-token across 10 providers, regional variation, outage minutes, and a new latency-adjusted cost metric. Sourced from KairosRoute routing telemetry.

The State of Agent Infrastructure, 2026

An annual industry report on what AI teams are actually running in production — model mix, observability adoption, cost-per-outcome improvements, and our best predictions for 2027. Based on KairosRoute routing telemetry and onboarding interviews.

You're Flying Blind on LLM Costs (And It's Expensive)

The OpenAI invoice tells you what you spent. It does not tell you what it was spent on. Here is the observability gap that costs AI teams 30–50% of their margin, and the minimum stack to close it.