Providers and routing
Kheish distinguishes between a configured route and the underlying provider driver. A configured route has:- a stable
route_idsuch asopenai,anthropic,openrouter, orresearch - one underlying provider driver such as
openai,anthropic,openrouter,google, orxai - one current model
- versioned capabilities exposed to the daemon, such as multimodal input, native web search, image generation, image editing, audio generation, and transcription
research can use the openrouter driver without being the same route as openrouter.
Kheish currently supports these first-class drivers:
anthropicgoogleopenaiopenrouterxai
Daemon route inventory
One daemon can expose multiple routes at once. The runtime keeps a daemon-owned route inventory that includes:- a default route used when no more specific policy applies
- one or more named routes such as
openai,anthropic,openrouter, orresearch - the current model and underlying driver for each route
- route capabilities such as multimodal input, native web search, image generation support, image editing support, audio generation support, and transcription support
- one session on Anthropic
- another session on OpenAI
- another session on a custom
researchroute backed by OpenRouter - one child sidechain on a cheaper route
- one image-editing run on one named route while text orchestration remains on another
Routes file
The recommended multi-route startup path isserve --routes-file ....
The file format is TOML:
state_root and keep reusing it. Replacing it later makes existing encrypted secret slots unreadable.
model_support = "any" is the normal choice for OpenRouter routes because OpenRouter model identifiers are typically vendor-prefixed, such as openai/gpt-5.4-mini or anthropic/claude-sonnet-4.
The repository also ships routes.default.toml at the repository root as a daemon-managed baseline for the built-in anthropic and openai routes.
Rules enforced by the daemon:
versionmust currently be1- unknown top-level fields and unknown per-route fields are rejected instead of ignored
- each route lives under
[routes.<route_id>] route_idmust be non-empty, must not contain/, and must not contain whitespace- each route must define
driveranddefault_model base_url, when set, must be an absolutehttp://orhttps://URL with a host and no userinfo, query, or fragment- if multiple routes are configured,
default_routemust be set in the file serve --default-route ...can override a validdefault_routefrom the file, but it does not currently make a missing multi-routedefault_routevalid
doctor routes --routes-file ./routes.toml --default-route <route_id> validates the same override semantics before you restart the daemon. Add --check-auth when the file uses auth_ref, openai_auth_source = "codex", or anthropic_auth_source = "claude_code"; it checks the running daemon auth slots and verifies whether account-auth routes already have their route.<route_id> slot or a source credentials file that can be imported on startup.
After the daemon is running, use doctor routes --route <route_id> --canary for an opt-in live canary. The canary creates a disposable session, submits a normal run with that route id and model pinned, waits for a bounded response, and reports passed, failed, or timeout with the session/run ids plus a route diagnostic on failure. It does not mutate the daemon default route. Use this for provider/model/base-URL confidence; keep --routes-file for static linting.
For an operator live matrix with a fresh daemon and a real OpenAI key from OPENAI_API_KEY, run:
openai.prod through the daemon auth store, checks doctor routes --check-auth, then verifies one passing route, one provider-rejected invalid model, and one dead base_url route without printing the raw key.
Use doctor routes --check-references after restarting with a changed route file. It inspects persisted session route policies and active schedules and reports route ids that no longer exist in the running inventory. This is the operator-facing quarantine check for stale references; new direct submissions and scheduled dispatches with missing route ids are still rejected before run persistence.
When a run is submitted or a scheduled input dispatches, the daemon checks the effective route after applying request overrides, session route policy, and default route. Routes with readiness error are rejected before a run is persisted, using problem code route_not_ready; routes with readiness warning still run. The same pinned route is rechecked immediately before run execution, so a queued or waiting run restored after restart fails on its original missing route instead of falling back to the active route. Persisted schedules pointing at a route removed after restart fail before run persistence and surface the missing route in last_scheduler_error. This keeps missing or revoked auth_ref slots and stale route ids from turning into queued runs that can only fail later at provider time.
The supported per-route fields are:
- core route fields:
driver,default_model - recommended secret reference:
auth_ref - direct auth inputs:
api_key,api_key_env - model compatibility policy:
model_support = "family"ormodel_support = "any" - transport:
base_url - OpenAI-only metadata:
organization,organization_env,project,project_env - OpenAI-only auth fallback:
openai_auth_source,openai_auth_file - Anthropic-only auth fallback and request tuning:
anthropic_auth_source,anthropic_credentials_file,anthropic_version,anthropic_beta_headers - capability overrides:
multimodal_input,native_web_search,image_generation,image_edit,audio_generation,transcription
2. Each runtime get route entry and each status.provider_readiness.routes[] entry includes:
matrix_versionmultimodal_inputnative_web_searchimage_generationimage_editaudio_generationtranscription
doctor routes reject a route file that advertises unsupported audio_generation = true or transcription = true; disabling a supported capability remains valid when you want a named route to be narrower than the driver default.
Route-auth behavior:
auth_ref, when present, must point at an existing daemon-managed secret slot- the daemon validates all configured
auth_refvalues at startup runtime getreturnsauth_refondefault_routeand each entry inrouteswhen one is configured, but never the underlying secret bytes- one secret slot can be shared by multiple routes without duplicating credentials in the route file
- OpenRouter routes are API-key-backed routes; they do not use OpenAI account auth imports such as Codex
- session or run routing decides which
route_idshould be used - the broker still resolves the actual credential material at request time
- the execution’s effective
CredentialScopecan therefore deny one route even when the route exists in the daemon inventory and was selected successfully
CredentialScope.route_allow and CredentialScope.route_deny when you need to delegate route access explicitly to one session or one child sidechain.
Selector grammar
The daemon CLI normalizes route-aware model selectors on these commands:runtime set-modelsessions inputsessions set-routeagents spawn-sidechainschedules create
gpt-5.4<route_id>/<model>
openrouter/openai/gpt-5.4-mini valid when the configured route id is openrouter.
Important normalization rules:
- the route prefix is recognized only when the segment before the first
/matches a known daemon route id - otherwise the full value is treated as a raw model string
<route_id>/without a model suffix is rejected--fallback-modelfollows the same grammar as--model- if
--providerand a selector prefix point at different routes, the CLI fails instead of guessing - when the selector contains a route prefix, that prefix is removed before the backend request is sent
Route precedence
Kheish resolves the effective route in this order:- explicit run override
- persisted session route policy
- daemon default route
Session route policy
Sessions can now persist a route policy instead of relying only on per-run overrides. Use this when one session should keep targeting a specific route across later runs, resumes, schedules, or mailbox execution. Child sidechain sessions do not automatically inherit the parent session’s stored route policy. They only persist a route policy when the spawn request carries explicit route fields or an explicitroute_policy.
This behavior is intentionally different from session personas. A child sidechain session does inherit the parent session’s bound persona snapshot at spawn time.
The control plane exposes this as a real session resource:
POST /v1/sessions/{session_id}/route-policyPUT /v1/sessions/{session_id}/route-policyDELETE /v1/sessions/{session_id}/route-policy
sessions set-route.
Run-scoped generation controls
Generation settings can be supplied with individual requests. Common controls include:- route id carried in the
providerrequest field - model
- fallback model
- temperature
- max output tokens
- tool choice
- response format
provider request field carries the selected route id after normalization. This keeps route selection explicit without forcing every session to change global defaults.
Capability-sensitive routing
Not every multimodal input requires the same model capability.- image inputs require a vision-capable route
- supported document inputs can still execute on non-vision routes because the daemon renders bounded document text for the model
multimodal_input mainly covers image inputs and document-derived text or previews. Audio generation and speech-to-text have their own route flags.
This distinction matters operationally. A route that is correct for text and PDF summaries may still be wrong for PNG or JPEG inspection.
Route capabilities are surfaced through GET /v1/runtime and GET /v1/status, so operators and SDKs can inspect the daemon inventory before submitting work.
Driver defaults today are:
- Anthropic: multimodal input and native web search, but no daemon image, audio, or transcription backend
- Google: multimodal input plus daemon image generation and image editing, but no native
web_search, audio generation, or transcription backend - OpenAI: multimodal input, native web search, daemon image generation, daemon image editing, audio generation, and transcription
- OpenRouter: multimodal input, daemon image generation, daemon image editing, audio generation, and transcription, but no native
web_search - xAI: multimodal input, native web search, daemon image generation, and daemon image editing, but no audio generation or transcription backend
Anthropic reasoning and tools
Anthropic extended thinking is supported with tools whengeneration.tool_choice is auto or omitted. The provider returns signed thinking blocks that are not part of the visible transcript; the daemon stores them as provider-private message context and replays them before the matching historical tool use/results on later turns.
Anthropic rejects extended thinking when tool_choice forces tool use (required/specific). Kheish fails that combination locally with a non-retryable provider error instead of sending a request that Anthropic will reject.
Provider-native web search
web_search remains one logical tool, but the daemon now prefers a provider-native backend when the effective route supports it.
Current behavior:
- Anthropic routes can use native provider web search
- Google routes currently do not expose native provider web search through
web_search - OpenAI routes can use native provider web search
- OpenRouter routes currently do not expose native provider web search through
web_search - xAI routes can use native provider web search
- unsupported routes or unsupported request shapes fall back to the local DuckDuckGo HTML implementation
web_fetch remains daemon-local.
Image routing
The image tools are now selected by image route id, not only by provider family. Current behavior:generate_imageresolves the backend in this order: explicit tool route override, current run route when that route has an image backend, then the image service default routeedit_imageresolves the backend in this order: explicit tool route override, current run route when that route supports editing, then the default image route when it supports editing, then the first configured edit-capable backend- the tool request field is still called
providerfor compatibility, but on a named-route daemon it means one configured image route id - the tool response field
providerstill reports the underlying backend provider such asopenaiorgoogle - route-level
image_generationandimage_editare independent capabilities; a route can expose generation without exposing editing
Audio routing
generate_audio follows the same route-first pattern as the image tools.
Current behavior:
generate_audioresolves the backend in this order: explicit tool route override, current run route when that route has an audio backend, then the audio service default route- the tool request field is still called
providerfor compatibility, but on a named-route daemon it means one configured audio route id - the tool response field
providerreports the underlying backend provider - the built-in audio-generation backends today are OpenAI and OpenRouter
- route-level
audio_generationis the capability flag that tells clients whethergenerate_audiocan target that route; routes withaudio_generation = falseare not registered as audio backends and do not make the tool visible
- speech synthesis targets
/v1/audio/speechand maps ordinary OpenAI text models to the defaultgpt-4o-mini-ttsTTS model unless a TTS model is supplied explicitly - speech synthesis accepts optional style/tone
instructions, validates voice/format/speed before upload, caps request text and provider response bytes, and persists supported generated formats as daemon assets only after daemon-side audio sniffing - speech-to-text targets
/v1/audio/transcriptionsand maps ordinary OpenAI text models togpt-4o-transcribeunless a transcription model is supplied explicitly - OpenAI speech-to-text rejects unsupported audio media types and payloads above the 25 MiB transcription request limit before provider upload
- OpenAI audio requests use a bounded provider request timeout instead of inheriting unbounded HTTP defaults, and speech-to-text requests honor run cancellation before/during provider I/O
- OpenAI Codex account-backed Responses endpoints are explicit text/tool routes only; their route capabilities disable daemon image, audio-generation, and transcription backends, and route files that try to re-enable those media capabilities are rejected
- OpenAI media provider debug artifacts store request/response metadata, redacted headers, byte counts, media types, and checksums, but not raw audio bytes or transcript text
- speech synthesis targets
/v1/audio/speech, maps ordinary chat models toopenai/gpt-4o-mini-tts-2025-12-15, and accepts only the OpenRoutermp3andpcmoutput formats - speech-to-text targets
/v1/audio/transcriptions, maps ordinary chat models toopenai/gpt-4o-mini-transcribe, uses top-levelinput_audio, rejects unsupported formats and payloads above 25 MiB before provider upload, and honors run cancellation before/during provider I/O - OpenRouter model discovery narrows text, tool, structured-output, vision, and image capabilities per concrete model; audio-generation and transcription remain route-level daemon backends because they can use dedicated media models even when the active chat model is text-only
- OpenRouter media provider debug artifacts store request/response metadata and byte counts without raw audio bytes or transcript text
Fallback behavior
Kheish can be configured with a primary route and an optional fallback model on that same resolved route. The routing layer does not pretend that all routes are interchangeable.--fallback-model uses the same selector grammar as --model, but after the primary route is resolved the fallback is validated against that route and stored as a normalized model string.
Changing the daemon default route does not rewrite:
- persisted session route policies
- queued runs
- active runs
- restored suspended runs
Context-sensitive prompt recovery
Recovered run memory is packed against prompt budget before provider submission. In practice this means Kheish does not append recovered memory blindly. It estimates the current prompt size, reserves output tokens, applies a safety buffer, and then injects only the recovered-memory entries that still fit. If they do not fit, it omits them rather than forcing a context overflow. Read Recovered run memory for the daemon-side storage and retention model.Operational guidance
Use explicit route and model settings in live validation and production-sensitive flows. This is especially useful when:- comparing route behavior on the same task
- validating provider-native web behavior against the local fallback
- running sidechains with lower-cost or faster models
- ensuring that one run does not inherit the wrong session or daemon default route unexpectedly
