What kr-auto Does (and Why It Beats Hand-Rolled Routing)
Every team running real AI traffic has the same conversation in month three: most of these calls didn't need a frontier model. The bill says so. The eval says so. The PM with the spreadsheet says so. The problem is what to do about it. Writing the "which model?" logic yourself is a project that eats a senior engineer for a quarter and never quite reaches steady state.
kr-auto is the answer we sell. Set model="auto" on a KairosRoute request and the right model gets picked for that specific request — under your quality bar, under your latency budget, on your spend ceiling — with a receipt explaining the decision. This post is about what that means in practice, and why the teams who try to build it themselves end up calling us.
What you get on every call
kr-auto turns a one-line API change into three things you didn't have before:
- The cheapest model that meets your quality floor — per request, not per workload. Most cost wins die because someone hardcoded "use the cheap model for support tickets" and forgot that 12% of support tickets are actually adversarial complaints that need the expensive model. Per-request routing fixes that without you maintaining the rule.
- A defensible answer when someone asks why. Every routed call comes with a receipt: which model, why this one beat the alternatives, what the alternatives would have cost, what the latency was. PM, Finance, Security, your CEO — the question shows up, the receipt is already there.
- A router that gets sharper on your specific traffic. Across the customer base we see what works for which task shapes. On your account we see how those general patterns hold against your users. Both signals fold back into the routing decision. You get better at month three than you were at month one without changing a line of code.
The benchmark numbers
We publish a public eval suite that you can read end to end. Same 240 prompts across 5 task categories, scored by an independent grader, repeated weekly. Today's headline: kr-auto matches or beats GPT-4.1 on accuracy at roughly a third of the cost. Every line item, every per-case judgment, every model version, every dollar — on the benchmarks page. We did not pick our spots.
On real production traffic — averaged across the customer base — savings run 50–85% versus pinning every call to a frontier model, with no measurable accuracy loss on the categories where routing applies. Your number depends on your traffic mix; the savings calculator gives you a workload-specific estimate in two minutes.
Why the "I'll just build this myself" plan fails
We have watched this movie maybe a hundred times. It always plays out the same:
- Month 1. An engineer writes hand-coded rules. If prompt has "summarize", use Haiku. Tests pass. Bill drops 30%. Standup wins.
- Month 2. Quality complaints start. The rules don't generalize. The engineer adds more rules. The rules start contradicting each other. Nobody owns when a rule fires wrong.
- Month 3. A provider ships a new model. The rule file has no concept of a new entrant. Either someone manually adds branches, or the system silently keeps using the old, now-dominated model and the bill never finds the new lower price.
- Month 4. Quality regresses for a slice of traffic. Nobody can answer which slice or why. The dashboard does not exist because the dashboard was never the project — the project was "save money on LLM calls."
- Month 6. The team rolls back to pinned-frontier-for-everything because the routing layer became unauditable. The bill goes back up. The engineer moves teams.
The deeper issue: a rule file can't learn. It can't notice that 12% of the "use the cheap model" bucket is actually adversarial complaints. It can't notice that the new model that shipped last Tuesday is now the right pick for half your traffic. It can't tell you why a regression happened. The whole point of routing is to keep getting better as your traffic and the model market change. A rule file is frozen the moment you write it.
Try it before you trust it
Three doors, all free. The playground runs a prompt of your choosing and shows the routing decision client-side, no signup. The public benchmarks are the head-to-head against GPT-4.1, Claude Sonnet, and the popular cheaper models on the same 240-prompt eval. The API key signup gives you 100K tokens and a $5 trial credit with no card. Two lines to swap in. If the receipts and the savings hold up on your own traffic, you'll know before the trial credit runs out.
Ready to route smarter?
KairosRoute gives you a single OpenAI-compatible endpoint that routes every request to the cheapest model meeting your quality bar — plus the observability, A/B testing, and cost analytics that turn cheaper infrastructure into a durable margin.
Related Reading
A single OpenAI-compatible endpoint that routes every request to the cheapest model that still meets your quality bar — plus the observability, A/B testing, and cost analytics that make that optimization durable.
Your model bill went down 20%. Nobody complained. Three weeks later, your agent's resolution rate has quietly dropped 12%. This is silent quality regression — and it is the single most dangerous failure mode in LLM ops.
Our Business tier is $499/month. Our Scale tier is $1,499/month. Our Enterprise tier starts at $25K ACV. Are those prices fair for what you get? This post is the real accounting — including a fully transparent 4% managed-key gateway fee.