Advanced Routing Controls
Fine-tune how KairosRoute selects providers and handles your requests with privacy controls, performance thresholds, quantization filtering, context compression, and response healing.
Data Privacy Controls
Control how your data is handled by setting privacy parameters on each request. KairosRoute filters out providers that don't meet your requirements.
Parameters
data_collection — Set to "deny" to exclude providers that may use your data for training. Default: "allow".
zdr — Set to true for Zero Data Retention. Only routes to providers with formal ZDR guarantees (OpenAI, Anthropic, Google, Mistral).
curl https://api.kairosroute.com/api/v1/chat/completions \
-H "Authorization: Bearer kr-..." \
-d '{
"model": "kr-auto",
"messages": [{"role": "user", "content": "Analyze this PII data"}],
"data_collection": "deny",
"zdr": true
}'| Provider | May Train | ZDR |
|---|---|---|
| OpenAI | No | Yes |
| Anthropic | No | Yes |
| No | Yes | |
| Mistral | No | Yes |
| Groq | No | No |
| Together AI | No | No |
| Fireworks | No | No |
| DeepSeek | Yes | No |
Performance Thresholds
KairosRoute tracks real-time latency and throughput per provider over a rolling 5-minute window. Set thresholds to automatically skip underperforming providers.
Parameters
preferred_max_latency — Max acceptable P75 latency in milliseconds. Providers above this are deprioritized.
preferred_min_throughput — Min acceptable P50 throughput in tokens/sec. Providers below this are deprioritized.
// Route to fast providers only (< 1 second P75 latency)
{
"model": "kr-auto",
"messages": [...],
"preferred_max_latency": 1000,
"preferred_min_throughput": 50
}Performance data is available at GET /api/v1/performance. Providers with fewer than 3 recent samples are given the benefit of the doubt.
Quantization Filtering
Some providers serve quantized (compressed) versions of models for faster/cheaper inference. Control the precision level of models you're routed to.
Precision Levels (high → low)
fp32 / fp16 / bf16 — Full precision (frontier APIs)
fp8 — 8-bit floating point (minimal quality loss)
int8 — 8-bit integer (moderate quality trade-off)
int4 — 4-bit integer (significant quality trade-off)
// Only accept FP8 or higher precision models
{
"model": "kr-auto",
"messages": [...],
"min_quantization": "fp8"
}
// Prefer cheaper quantized models (int8 or lower)
{
"model": "kr-auto",
"messages": [...],
"max_quantization": "int8"
}Context Compression
When a conversation exceeds a model's context window, KairosRoute applies middle-out truncation instead of returning an error. System messages and recent messages are preserved; older middle messages are dropped with a truncation notice.
Behavior
Auto-enabled for models with ≤ 8K context windows.
Opt-in for larger models via header: X-KairosRoute-Compress: true
When compression occurs, response includes X-KairosRoute-Compressed: true and X-KairosRoute-Messages-Dropped: N headers.
Preservation priority: System messages → last 2 messages (current turn) → first message (context setup) → recent messages (newest first) → older messages (dropped).
Response Healing
When you request structured JSON output via response_format, KairosRoute automatically attempts to fix common JSON issues in the model's response.
Auto-fixed Issues
Markdown code fences wrapping JSON (```json ... ```)
Trailing commas in objects/arrays
Single quotes instead of double quotes
Truncated JSON (unclosed braces/brackets)
Surrounding text before/after JSON
JavaScript-style comments
// Request structured output — healing is automatic
{
"model": "kr-auto",
"messages": [{"role": "user", "content": "List 3 colors as JSON"}],
"response_format": { "type": "json_object" }
}
// Response headers when healing was applied:
// X-KairosRoute-Healed: true
// X-KairosRoute-Healing: stripped_markdown_fences,removed_trailing_commas