Docs/Model Variants

Model Variants

Append a suffix to any model ID to modify its behavior. Get the cheapest version, enable reasoning mode, or use extended context — all with a simple naming convention.

Available Variants

Suffix	What It Does	Example
:free	Routes to the cheapest model from the same provider	gpt-4o:free → gpt-4o-mini
:thinking	Enables extended reasoning / chain-of-thought	claude-3.5-sonnet:thinking
:extended	Uses the extended context window version	gemini-2.0-flash:extended
:online	Augments with real-time web search (coming soon)	gpt-4o:online

:free — Cheapest Model

Append :free to any model to route to the cheapest option from the same provider. Use with kr-auto:free for the absolute cheapest model across all providers.

import openai

client = openai.OpenAI(
    api_key="kr_live_...",
    base_url="https://api.kairosroute.com/v1"
)

# Cheapest OpenAI model
response = client.chat.completions.create(
    model="gpt-4o:free",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Cheapest model across ALL providers
response = client.chat.completions.create(
    model="kr-auto:free",
    messages=[{"role": "user", "content": "Hello!"}]
)

:thinking — Extended Reasoning

Append :thinking to enable chain-of-thought reasoning. The router automatically selects the best reasoning-capable model and configures it for deep thinking. Great for math, logic, and complex analysis.

# Deep reasoning with Anthropic
response = client.chat.completions.create(
    model="claude-3.5-sonnet:thinking",
    messages=[{
        "role": "user",
        "content": "Prove that there are infinitely many primes."
    }]
)

# Or with OpenAI (routes to o3/o4-mini)
response = client.chat.completions.create(
    model="gpt-4o:thinking",
    messages=[{
        "role": "user",
        "content": "What is 127 * 389 + 42^3?"
    }]
)

:extended — Large Context

Append :extended to prioritize the version of a model with the largest context window. Ideal for processing long documents, codebases, or transcripts.

# Process a long document with Gemini (1M+ tokens)
response = client.chat.completions.create(
    model="gemini-2.5-pro:extended",
    messages=[{
        "role": "user",
        "content": long_document + "\n\nSummarize the key themes."
    }]
)

Combine with kr-auto

Variants work with kr-auto too. The router picks the best model and applies the variant behavior.

Model	Behavior
kr-auto	Best model for the task (default)
kr-auto:free	Cheapest model that can handle the task
kr-auto:thinking	Best reasoning model for the task
kr-auto:extended	Best model with largest context window

Try it now

Just append a suffix to your model name. No configuration needed.

Get Started