Platform · Observability

No more
black box.

Every routing decision leaves a receipt. Which model won. Why. What it actually cost. In dollars, not estimates.

Quality-gated routing.

Every request is read for what it actually needs. We pick the cheapest model that clears the bar — never a downgrade to save a penny.

See the router

02ObserveYou are here

A receipt for every call.

Structured record of what the router saw, who it picked, what it cost. Streamed to the OTel stack you already run.

Currently viewing

03Improve

A router that gets smarter.

A daily learning pass tunes the router on your own traffic. Eval-gated before promotion. One flip to roll back.

See the loop

See the decision.
Not just the output.

One structured record per request. What the router classified the task as, which models were considered, who won, what it cost, and how long it took. Every field joinable to your traces.

Full schema

receipt.json

{
  "receipt_id": "rcpt_01JKEZ4F9T...",
  "classification": {
    "category": "reasoning",
    "confidence": 0.91
  },
  "candidates": [
    { "model_id": "claude-sonnet-4-6", "score": 0.88,
      "est_cost_usd": 0.0142, "reasons": ["in-budget"] },
    { "model_id": "gpt-4.1", "score": 0.83,
      "filtered_out_by": "budget" }
  ],
  "decision": {
    "model_id": "claude-sonnet-4-6",
    "strategy": "single",
    "fallback_chain": ["gpt-4.1"],
    "prompt_rewritten": true
  },
  "execution": {
    "actual_cost_usd": 0.0131,
    "actual_latency_ms": 692,
    "status": "success"
  }
}

Spend analytics

A ledger finance can actually read.

Spend by tenant, workflow, and category. Budget burn, trend deltas, and the alerts that surface before anyone has to ask “why did we spend $40k last month?”

kairosroute.com/dashboard/spend

MTD spend · 5 tenants

$6,690−38% vs. last month

April 2026

All workloads

Spend by tenant

MTD · budget burn in the right column

acme-prod

$2,840

71% +8%

acme-staging

$612

31% -4%

nova-labs-prod

$1,980

89% +14%

internal-copilot

$918

42% +2%

research-sandbox

$340

28% -1%

Spend by category

code34%
reasoning22%
analysis18%
extraction12%
summarization8%
creative6%

Alerts & events

2 warnings · 3 info

warnnova-labs-prod at 89% of monthly budget — 6 days left12m ago

infogpt-5.4 promoted on reasoning after eval +2.1pp2h ago

okp95 latency under 800ms for 48h straight4h ago

infocache hit rate crossed 40% on summarization9h ago

warnprivacy filter dropped 14 candidates (tenant acme-prod)1d ago

Sample data. Every number is joinable to the underlying receipts.

The dashboard

Answer “why did it do that?”
before anyone asks.

Nine views, all joined on the same receipt. Every click deep-links to the decision on the record.

Savings

What the router saved you

Rolling 7/30/90-day view of "what you would have paid on your old single-model setup" vs. the actual router bill. Broken out by category and tenant.

Model mix

Which models did the work

Stack chart of spend and request share by model, by category. Spot when a new release shifts traffic on its own.

Latency

p50 / p95 / p99 by route

Per-model, per-category, per-region. Fallback hops are split out so a slow primary doesn’t hide behind a fast secondary.

Live feed

Decisions as they happen

Tail of every decision with category, winner, fallback chain, and cost. Click any row to jump to the full receipt.

Receipt search

Find the one that matters

Filter by tenant, model, category, cost, latency, or status. Saved views for on-call, finance, and security. Export any view as CSV.

Replay

Re-run any request

Re-issue a receipt against the same model or a different one. Compare outputs side-by-side. No re-deploy.

Spend

Where the money went

Spend by tenant, workflow, or model. Hard caps that enforce, not warn. Exportable ledger in the same shape your finance team already uses.

Alerts

Tell you before they ask

Budget burn, p95 drift, fallback rate, cache hit rate, classifier confidence. Webhook, email, Slack, or PagerDuty.

Tenants

Per-customer breakdown

Everything above, pivoted by tenant. For anyone reselling the router to their own users or running multi-workspace billing.

OpenTelemetry

Ship receipts wherever
your traces already go.

Every receipt emits as an OTLP span with a trace_id matched to your app. Point the exporter at whatever you already run. No proprietary agent, no custom SDK, no vendor lock-in.

If it speaks OTLP, it works. Datadog. Honeycomb. Tempo. New Relic. Jaeger. Grafana Cloud. Dynatrace. Chronosphere. Your own collector. Anything built on the open spec is fair game.

otel-collector.yaml

receivers:
  otlp:
    protocols:
      http: { endpoint: 0.0.0.0:4318 }

# Point these at whatever you already run —
# Datadog, Honeycomb, Tempo, New Relic, Jaeger,
# Grafana Cloud, Dynatrace, Chronosphere, or
# your own OTel collector.
exporters:
  otlphttp/your-backend:
    endpoint: ${OTLP_ENDPOINT}
    headers: { authorization: ${OTLP_TOKEN} }

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlphttp/your-backend]

Re-run any request.
Any model. Any time.

Every receipt is replayable. Compare models, tune prompts, debug bad outputs without rewriting your code.

Start free See how the router learns →

No moreblack box.