Model Variants
Append a suffix to any model ID to modify its behavior. Get the cheapest version, enable reasoning mode, or use extended context — all with a simple naming convention.
Available Variants
| Suffix | What It Does | Example |
|---|---|---|
| :free | Routes to the cheapest model from the same provider | gpt-4o:free → gpt-4o-mini |
| :thinking | Enables extended reasoning / chain-of-thought | claude-3.5-sonnet:thinking |
| :extended | Uses the extended context window version | gemini-2.0-flash:extended |
| :online | Augments with real-time web search (coming soon) | gpt-4o:online |
:free — Cheapest Model
Append :free to any model to route to the cheapest option from the same provider. Use with kr-auto:free for the absolute cheapest model across all providers.
import openai
client = openai.OpenAI(
api_key="kr_live_...",
base_url="https://api.kairosroute.com/v1"
)
# Cheapest OpenAI model
response = client.chat.completions.create(
model="gpt-4o:free",
messages=[{"role": "user", "content": "Hello!"}]
)
# Cheapest model across ALL providers
response = client.chat.completions.create(
model="kr-auto:free",
messages=[{"role": "user", "content": "Hello!"}]
):thinking — Extended Reasoning
Append :thinking to enable chain-of-thought reasoning. The router automatically selects the best reasoning-capable model and configures it for deep thinking. Great for math, logic, and complex analysis.
# Deep reasoning with Anthropic
response = client.chat.completions.create(
model="claude-3.5-sonnet:thinking",
messages=[{
"role": "user",
"content": "Prove that there are infinitely many primes."
}]
)
# Or with OpenAI (routes to o3/o4-mini)
response = client.chat.completions.create(
model="gpt-4o:thinking",
messages=[{
"role": "user",
"content": "What is 127 * 389 + 42^3?"
}]
):extended — Large Context
Append :extended to prioritize the version of a model with the largest context window. Ideal for processing long documents, codebases, or transcripts.
# Process a long document with Gemini (1M+ tokens)
response = client.chat.completions.create(
model="gemini-2.5-pro:extended",
messages=[{
"role": "user",
"content": long_document + "\n\nSummarize the key themes."
}]
)Combine with kr-auto
Variants work with kr-auto too. The router picks the best model and applies the variant behavior.
| Model | Behavior |
|---|---|
| kr-auto | Best model for the task (default) |
| kr-auto:free | Cheapest model that can handle the task |
| kr-auto:thinking | Best reasoning model for the task |
| kr-auto:extended | Best model with largest context window |