Skip to main content

Providers and routing

Kheish distinguishes between a configured route and the underlying provider driver. A configured route has:
  • a stable route_id such as openai, anthropic, openrouter, or research
  • one underlying provider driver such as openai, anthropic, openrouter, google, or xai
  • one current model
  • versioned capabilities exposed to the daemon, such as multimodal input, native web search, image generation, image editing, audio generation, and transcription
This distinction still matters because one daemon can expose multiple named routes that share the same driver. For example, a route named research can use the openrouter driver without being the same route as openrouter. Kheish currently supports these first-class drivers:
  • anthropic
  • google
  • openai
  • openrouter
  • xai

Daemon route inventory

One daemon can expose multiple routes at once. The runtime keeps a daemon-owned route inventory that includes:
  • a default route used when no more specific policy applies
  • one or more named routes such as openai, anthropic, openrouter, or research
  • the current model and underlying driver for each route
  • route capabilities such as multimodal input, native web search, image generation support, image editing support, audio generation support, and transcription support
This is why one daemon can run:
  • one session on Anthropic
  • another session on OpenAI
  • another session on a custom research route backed by OpenRouter
  • one child sidechain on a cheaper route
  • one image-editing run on one named route while text orchestration remains on another
without changing the daemon process itself.

Routes file

The recommended multi-route startup path is serve --routes-file .... The file format is TOML:
version = 1
default_route = "openrouter"

[routes.openrouter]
driver = "openrouter"
default_model = "openai/gpt-5.4-mini"
model_support = "any"
auth_ref = "openrouter.primary"

[routes.openai]
driver = "openai"
default_model = "gpt-5.4"
auth_ref = "openai.prod"
Populate those refs through the daemon-managed secret store before startup:
export KHEISH_AUTH_STORE_MASTER_KEY="$(./target/debug/kheish-daemon secrets generate)"

./target/debug/kheish-daemon secrets set openrouter.primary \
  --offline \
  --state-root .kheish-daemon-data \
  --provider openrouter \
  --from-env OPENROUTER_API_KEY

./target/debug/kheish-daemon secrets set openai.prod \
  --offline \
  --state-root .kheish-daemon-data \
  --provider openai \
  --from-env OPENAI_API_KEY
Generate this key once per persistent state_root and keep reusing it. Replacing it later makes existing encrypted secret slots unreadable. model_support = "any" is the normal choice for OpenRouter routes because OpenRouter model identifiers are typically vendor-prefixed, such as openai/gpt-5.4-mini or anthropic/claude-sonnet-4. The repository also ships routes.default.toml at the repository root as a daemon-managed baseline for the built-in anthropic and openai routes. Rules enforced by the daemon:
  • version must currently be 1
  • unknown top-level fields and unknown per-route fields are rejected instead of ignored
  • each route lives under [routes.<route_id>]
  • route_id must be non-empty, must not contain /, and must not contain whitespace
  • each route must define driver and default_model
  • base_url, when set, must be an absolute http:// or https:// URL with a host and no userinfo, query, or fragment
  • if multiple routes are configured, default_route must be set in the file
  • serve --default-route ... can override a valid default_route from the file, but it does not currently make a missing multi-route default_route valid
doctor routes --routes-file ./routes.toml --default-route <route_id> validates the same override semantics before you restart the daemon. Add --check-auth when the file uses auth_ref, openai_auth_source = "codex", or anthropic_auth_source = "claude_code"; it checks the running daemon auth slots and verifies whether account-auth routes already have their route.<route_id> slot or a source credentials file that can be imported on startup. After the daemon is running, use doctor routes --route <route_id> --canary for an opt-in live canary. The canary creates a disposable session, submits a normal run with that route id and model pinned, waits for a bounded response, and reports passed, failed, or timeout with the session/run ids plus a route diagnostic on failure. It does not mutate the daemon default route. Use this for provider/model/base-URL confidence; keep --routes-file for static linting. For an operator live matrix with a fresh daemon and a real OpenAI key from OPENAI_API_KEY, run:
scripts/e2e/routes_canary_live_matrix.sh
The script seeds openai.prod through the daemon auth store, checks doctor routes --check-auth, then verifies one passing route, one provider-rejected invalid model, and one dead base_url route without printing the raw key. Use doctor routes --check-references after restarting with a changed route file. It inspects persisted session route policies and active schedules and reports route ids that no longer exist in the running inventory. This is the operator-facing quarantine check for stale references; new direct submissions and scheduled dispatches with missing route ids are still rejected before run persistence. When a run is submitted or a scheduled input dispatches, the daemon checks the effective route after applying request overrides, session route policy, and default route. Routes with readiness error are rejected before a run is persisted, using problem code route_not_ready; routes with readiness warning still run. The same pinned route is rechecked immediately before run execution, so a queued or waiting run restored after restart fails on its original missing route instead of falling back to the active route. Persisted schedules pointing at a route removed after restart fail before run persistence and surface the missing route in last_scheduler_error. This keeps missing or revoked auth_ref slots and stale route ids from turning into queued runs that can only fail later at provider time. The supported per-route fields are:
  • core route fields: driver, default_model
  • recommended secret reference: auth_ref
  • direct auth inputs: api_key, api_key_env
  • model compatibility policy: model_support = "family" or model_support = "any"
  • transport: base_url
  • OpenAI-only metadata: organization, organization_env, project, project_env
  • OpenAI-only auth fallback: openai_auth_source, openai_auth_file
  • Anthropic-only auth fallback and request tuning: anthropic_auth_source, anthropic_credentials_file, anthropic_version, anthropic_beta_headers
  • capability overrides: multimodal_input, native_web_search, image_generation, image_edit, audio_generation, transcription
Capabilities are route-level data. Two routes that use the same driver can still expose different capability flags. The daemon currently exposes route capability matrix version 2. Each runtime get route entry and each status.provider_readiness.routes[] entry includes:
  • matrix_version
  • multimodal_input
  • native_web_search
  • image_generation
  • image_edit
  • audio_generation
  • transcription
Capability overrides must describe a backend that actually exists. Startup and doctor routes reject a route file that advertises unsupported audio_generation = true or transcription = true; disabling a supported capability remains valid when you want a named route to be narrower than the driver default. Route-auth behavior:
  • auth_ref, when present, must point at an existing daemon-managed secret slot
  • the daemon validates all configured auth_ref values at startup
  • runtime get returns auth_ref on default_route and each entry in routes when one is configured, but never the underlying secret bytes
  • one secret slot can be shared by multiple routes without duplicating credentials in the route file
  • OpenRouter routes are API-key-backed routes; they do not use OpenAI account auth imports such as Codex
Route selection and route authorization are related but different:
  • session or run routing decides which route_id should be used
  • the broker still resolves the actual credential material at request time
  • the execution’s effective CredentialScope can therefore deny one route even when the route exists in the daemon inventory and was selected successfully
Use CredentialScope.route_allow and CredentialScope.route_deny when you need to delegate route access explicitly to one session or one child sidechain.

Selector grammar

The daemon CLI normalizes route-aware model selectors on these commands:
  • runtime set-model
  • sessions input
  • sessions set-route
  • agents spawn-sidechain
  • schedules create
The selector grammar is:
  • gpt-5.4
  • <route_id>/<model>
The second form is route-aware. The first slash is the separator, so the model part may itself contain additional slashes. This makes selectors such as openrouter/openai/gpt-5.4-mini valid when the configured route id is openrouter. Important normalization rules:
  • the route prefix is recognized only when the segment before the first / matches a known daemon route id
  • otherwise the full value is treated as a raw model string
  • <route_id>/ without a model suffix is rejected
  • --fallback-model follows the same grammar as --model
  • if --provider and a selector prefix point at different routes, the CLI fails instead of guessing
  • when the selector contains a route prefix, that prefix is removed before the backend request is sent

Route precedence

Kheish resolves the effective route in this order:
  1. explicit run override
  2. persisted session route policy
  3. daemon default route
Once selected, the route is pinned for that run.

Session route policy

Sessions can now persist a route policy instead of relying only on per-run overrides. Use this when one session should keep targeting a specific route across later runs, resumes, schedules, or mailbox execution. Child sidechain sessions do not automatically inherit the parent session’s stored route policy. They only persist a route policy when the spawn request carries explicit route fields or an explicit route_policy. This behavior is intentionally different from session personas. A child sidechain session does inherit the parent session’s bound persona snapshot at spawn time. The control plane exposes this as a real session resource:
  • POST /v1/sessions/{session_id}/route-policy
  • PUT /v1/sessions/{session_id}/route-policy
  • DELETE /v1/sessions/{session_id}/route-policy
The CLI also exposes sessions set-route.

Run-scoped generation controls

Generation settings can be supplied with individual requests. Common controls include:
  • route id carried in the provider request field
  • model
  • fallback model
  • temperature
  • max output tokens
  • tool choice
  • response format
On a named-route daemon, the provider request field carries the selected route id after normalization. This keeps route selection explicit without forcing every session to change global defaults.

Capability-sensitive routing

Not every multimodal input requires the same model capability.
  • image inputs require a vision-capable route
  • supported document inputs can still execute on non-vision routes because the daemon renders bounded document text for the model
In the current route-capability surface, multimodal_input mainly covers image inputs and document-derived text or previews. Audio generation and speech-to-text have their own route flags. This distinction matters operationally. A route that is correct for text and PDF summaries may still be wrong for PNG or JPEG inspection. Route capabilities are surfaced through GET /v1/runtime and GET /v1/status, so operators and SDKs can inspect the daemon inventory before submitting work. Driver defaults today are:
  • Anthropic: multimodal input and native web search, but no daemon image, audio, or transcription backend
  • Google: multimodal input plus daemon image generation and image editing, but no native web_search, audio generation, or transcription backend
  • OpenAI: multimodal input, native web search, daemon image generation, daemon image editing, audio generation, and transcription
  • OpenRouter: multimodal input, daemon image generation, daemon image editing, audio generation, and transcription, but no native web_search
  • xAI: multimodal input, native web search, daemon image generation, and daemon image editing, but no audio generation or transcription backend
These are defaults, not a hard-coded public matrix. A routes file can override the exposed capabilities for one named route without changing another route that uses the same driver.

Anthropic reasoning and tools

Anthropic extended thinking is supported with tools when generation.tool_choice is auto or omitted. The provider returns signed thinking blocks that are not part of the visible transcript; the daemon stores them as provider-private message context and replays them before the matching historical tool use/results on later turns. Anthropic rejects extended thinking when tool_choice forces tool use (required/specific). Kheish fails that combination locally with a non-retryable provider error instead of sending a request that Anthropic will reject. web_search remains one logical tool, but the daemon now prefers a provider-native backend when the effective route supports it. Current behavior:
  • Anthropic routes can use native provider web search
  • Google routes currently do not expose native provider web search through web_search
  • OpenAI routes can use native provider web search
  • OpenRouter routes currently do not expose native provider web search through web_search
  • xAI routes can use native provider web search
  • unsupported routes or unsupported request shapes fall back to the local DuckDuckGo HTML implementation
web_fetch remains daemon-local.

Image routing

The image tools are now selected by image route id, not only by provider family. Current behavior:
  • generate_image resolves the backend in this order: explicit tool route override, current run route when that route has an image backend, then the image service default route
  • edit_image resolves the backend in this order: explicit tool route override, current run route when that route supports editing, then the default image route when it supports editing, then the first configured edit-capable backend
  • the tool request field is still called provider for compatibility, but on a named-route daemon it means one configured image route id
  • the tool response field provider still reports the underlying backend provider such as openai or google
  • route-level image_generation and image_edit are independent capabilities; a route can expose generation without exposing editing

Audio routing

generate_audio follows the same route-first pattern as the image tools. Current behavior:
  • generate_audio resolves the backend in this order: explicit tool route override, current run route when that route has an audio backend, then the audio service default route
  • the tool request field is still called provider for compatibility, but on a named-route daemon it means one configured audio route id
  • the tool response field provider reports the underlying backend provider
  • the built-in audio-generation backends today are OpenAI and OpenRouter
  • route-level audio_generation is the capability flag that tells clients whether generate_audio can target that route; routes with audio_generation = false are not registered as audio backends and do not make the tool visible
OpenAI media routes use the OpenAI audio APIs directly:
  • speech synthesis targets /v1/audio/speech and maps ordinary OpenAI text models to the default gpt-4o-mini-tts TTS model unless a TTS model is supplied explicitly
  • speech synthesis accepts optional style/tone instructions, validates voice/format/speed before upload, caps request text and provider response bytes, and persists supported generated formats as daemon assets only after daemon-side audio sniffing
  • speech-to-text targets /v1/audio/transcriptions and maps ordinary OpenAI text models to gpt-4o-transcribe unless a transcription model is supplied explicitly
  • OpenAI speech-to-text rejects unsupported audio media types and payloads above the 25 MiB transcription request limit before provider upload
  • OpenAI audio requests use a bounded provider request timeout instead of inheriting unbounded HTTP defaults, and speech-to-text requests honor run cancellation before/during provider I/O
  • OpenAI Codex account-backed Responses endpoints are explicit text/tool routes only; their route capabilities disable daemon image, audio-generation, and transcription backends, and route files that try to re-enable those media capabilities are rejected
  • OpenAI media provider debug artifacts store request/response metadata, redacted headers, byte counts, media types, and checksums, but not raw audio bytes or transcript text
OpenRouter media routes use OpenRouter’s dedicated media APIs:
  • speech synthesis targets /v1/audio/speech, maps ordinary chat models to openai/gpt-4o-mini-tts-2025-12-15, and accepts only the OpenRouter mp3 and pcm output formats
  • speech-to-text targets /v1/audio/transcriptions, maps ordinary chat models to openai/gpt-4o-mini-transcribe, uses top-level input_audio, rejects unsupported formats and payloads above 25 MiB before provider upload, and honors run cancellation before/during provider I/O
  • OpenRouter model discovery narrows text, tool, structured-output, vision, and image capabilities per concrete model; audio-generation and transcription remain route-level daemon backends because they can use dedicated media models even when the active chat model is text-only
  • OpenRouter media provider debug artifacts store request/response metadata and byte counts without raw audio bytes or transcript text

Fallback behavior

Kheish can be configured with a primary route and an optional fallback model on that same resolved route. The routing layer does not pretend that all routes are interchangeable. --fallback-model uses the same selector grammar as --model, but after the primary route is resolved the fallback is validated against that route and stored as a normalized model string. Changing the daemon default route does not rewrite:
  • persisted session route policies
  • queued runs
  • active runs
  • restored suspended runs

Context-sensitive prompt recovery

Recovered run memory is packed against prompt budget before provider submission. In practice this means Kheish does not append recovered memory blindly. It estimates the current prompt size, reserves output tokens, applies a safety buffer, and then injects only the recovered-memory entries that still fit. If they do not fit, it omits them rather than forcing a context overflow. Read Recovered run memory for the daemon-side storage and retention model.

Operational guidance

Use explicit route and model settings in live validation and production-sensitive flows. This is especially useful when:
  • comparing route behavior on the same task
  • validating provider-native web behavior against the local fallback
  • running sidechains with lower-cost or faster models
  • ensuring that one run does not inherit the wrong session or daemon default route unexpectedly