No more
black box.
Every routing decision leaves a receipt. Which model won. Why. What it actually cost. In dollars, not estimates.
Quality-gated routing.
Every request is read for what it actually needs. We pick the cheapest model that clears the bar — never a downgrade to save a penny.
See the routerA receipt for every call.
Structured record of what the router saw, who it picked, what it cost. Streamed to the OTel stack you already run.
Currently viewingA router that gets smarter.
A daily learning pass tunes the router on your own traffic. Eval-gated before promotion. One flip to roll back.
See the loopSee the decision.
Not just the output.
One structured record per request. What the router classified the task as, which models were considered, who won, what it cost, and how long it took. Every field joinable to your traces.
Full schema{
"receipt_id": "rcpt_01JKEZ4F9T...",
"classification": {
"category": "reasoning",
"confidence": 0.91
},
"candidates": [
{ "model_id": "claude-sonnet-4-6", "score": 0.88,
"est_cost_usd": 0.0142, "reasons": ["in-budget"] },
{ "model_id": "gpt-4.1", "score": 0.83,
"filtered_out_by": "budget" }
],
"decision": {
"model_id": "claude-sonnet-4-6",
"strategy": "single",
"fallback_chain": ["gpt-4.1"],
"prompt_rewritten": true
},
"execution": {
"actual_cost_usd": 0.0131,
"actual_latency_ms": 692,
"status": "success"
}
}A ledger finance can actually read.
Spend by tenant, workflow, and category. Budget burn, trend deltas, and the alerts that surface before anyone has to ask “why did we spend $40k last month?”
acme-prodacme-stagingnova-labs-prodinternal-copilotresearch-sandbox- code34%
- reasoning22%
- analysis18%
- extraction12%
- summarization8%
- creative6%
Sample data. Every number is joinable to the underlying receipts.
Answer “why did it do that?”
before anyone asks.
Nine views, all joined on the same receipt. Every click deep-links to the decision on the record.
What the router saved you
Rolling 7/30/90-day view of "what you would have paid on your old single-model setup" vs. the actual router bill. Broken out by category and tenant.
Which models did the work
Stack chart of spend and request share by model, by category. Spot when a new release shifts traffic on its own.
p50 / p95 / p99 by route
Per-model, per-category, per-region. Fallback hops are split out so a slow primary doesn’t hide behind a fast secondary.
Decisions as they happen
Tail of every decision with category, winner, fallback chain, and cost. Click any row to jump to the full receipt.
Find the one that matters
Filter by tenant, model, category, cost, latency, or status. Saved views for on-call, finance, and security. Export any view as CSV.
Re-run any request
Re-issue a receipt against the same model or a different one. Compare outputs side-by-side. No re-deploy.
Where the money went
Spend by tenant, workflow, or model. Hard caps that enforce, not warn. Exportable ledger in the same shape your finance team already uses.
Tell you before they ask
Budget burn, p95 drift, fallback rate, cache hit rate, classifier confidence. Webhook, email, Slack, or PagerDuty.
Per-customer breakdown
Everything above, pivoted by tenant. For anyone reselling the router to their own users or running multi-workspace billing.
Ship receipts wherever
your traces already go.
Every receipt emits as an OTLP span with a trace_id matched to your app. Point the exporter at whatever you already run. No proprietary agent, no custom SDK, no vendor lock-in.
If it speaks OTLP, it works. Datadog. Honeycomb. Tempo. New Relic. Jaeger. Grafana Cloud. Dynatrace. Chronosphere. Your own collector. Anything built on the open spec is fair game.
receivers:
otlp:
protocols:
http: { endpoint: 0.0.0.0:4318 }
# Point these at whatever you already run —
# Datadog, Honeycomb, Tempo, New Relic, Jaeger,
# Grafana Cloud, Dynatrace, Chronosphere, or
# your own OTel collector.
exporters:
otlphttp/your-backend:
endpoint: ${OTLP_ENDPOINT}
headers: { authorization: ${OTLP_TOKEN} }
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlphttp/your-backend]Re-run any request.
Any model. Any time.
Every receipt is replayable. Compare models, tune prompts, debug bad outputs without rewriting your code.