K
KairosRoute
← Back to Docs

Advanced Routing Controls

Fine-tune how KairosRoute selects providers and handles your requests with privacy controls, performance thresholds, quantization filtering, context compression, and response healing.

Data Privacy Controls

Control how your data is handled by setting privacy parameters on each request. KairosRoute filters out providers that don't meet your requirements.

Parameters

data_collection. Set to "deny" to exclude providers that may use your data for training. Default: "allow".

zdr. Set to true for Zero Data Retention. Only routes to providers with formal ZDR guarantees (OpenAI, Anthropic, Google, Mistral).

curl https://api.kairosroute.com/v1/chat/completions \
  -H "Authorization: Bearer kr-your-key" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Analyze this PII data"}],
    "data_collection": "deny",
    "zdr": true
  }'
ProviderMay TrainZDR
OpenAINoYes
AnthropicNoYes
GoogleNoYes
MistralNoYes
GroqNoNo
Together AINoNo
FireworksNoNo
DeepSeekYesNo

Performance Thresholds

KairosRoute tracks real-time latency and throughput per provider over a rolling 5-minute window. Set thresholds to automatically skip underperforming providers.

Parameters

preferred_max_latency. Max acceptable P75 latency in milliseconds. Providers above this are deprioritized.

preferred_min_throughput. Min acceptable P50 throughput in tokens/sec. Providers below this are deprioritized.

// Route to fast providers only (< 1 second P75 latency)
{
  "model": "auto",
  "messages": [...],
  "preferred_max_latency": 1000,
  "preferred_min_throughput": 50
}

Performance data is available at GET /api/v1/performance. Providers with fewer than 3 recent samples are given the benefit of the doubt.

Quantization Filtering

Some providers serve quantized (compressed) versions of models for faster/cheaper inference. Control the precision level of models you're routed to.

Precision Levels (high → low)

fp32 / fp16 / bf16. Full precision (frontier APIs)

fp8, 8-bit floating point (minimal quality loss)

int8, 8-bit integer (moderate quality trade-off)

int4, 4-bit integer (significant quality trade-off)

// Only accept FP8 or higher precision models
{
  "model": "auto",
  "messages": [...],
  "min_quantization": "fp8"
}

// Prefer cheaper quantized models (int8 or lower)
{
  "model": "auto",
  "messages": [...],
  "max_quantization": "int8"
}

Context Compression

When a conversation exceeds a model's context window, KairosRoute applies middle-out truncation instead of returning an error. System messages and recent messages are preserved; older middle messages are dropped with a truncation notice.

Behavior

Auto-enabled for models with ≤ 8K context windows.

Opt-in for larger models via header: X-KairosRoute-Compress: true

When compression occurs, response includes X-KairosRoute-Compressed: true and X-KairosRoute-Messages-Dropped: N headers.

Preservation priority: System messages → last 2 messages (current turn) → first message (context setup) → recent messages (newest first) → older messages (dropped).

Bypass the classifier

kr-auto runs a task classifier on every request to pick the cheapest model that clears the per-task quality bar. If you'd rather route deterministically on your own guardrails, you can turn it off — per key, or per request.

How it works

Per key: toggle Run kr-auto classifier off in the API key settings on the dashboard.

Per request: send the header X-KR-Classifier: off. Use on to force it back on for a single call.

When bypassed, the gateway routes to the cheapest active model that clears your minQualityThreshold (default 0.7), respects allowedProviders / excludedProviders, and stays under maxCostPerRequestUsd.

No category-based capability filter is applied — every chat-capable model qualifies. Set a higher minQualityThreshold if you want to keep the floor.

When to use it: you trust your own min-quality + provider lists more than the classifier; you want fully deterministic routing across similar prompts; you have a workload that doesnt fit the 6-category taxonomy.