Skip to main content

State and recovery

Durability is a core design property of Kheish.

Session journal

Session state is persisted as append-only JSONL records. The journal stores:
  • conversation events
  • checkpoints
  • metadata records
  • permission audit records
  • output records
Session metadata also carries:
  • the optional bound persona snapshot for that session, including the bound capability baseline and resolved persona default inline skills
  • the optional persisted session capability scope override
  • the optional persisted session credential scope override
  • the optional persisted session reply-target defaults
Channels use their own daemon-owned storage rather than piggybacking on session journals. That storage now keeps:
  • channel records
  • public message logs
  • aggregated reactions
  • public turn leases
  • queued channel stimuli
  • canonical thread-work state, including work bindings and progress snapshots
Projects also use their own daemon-owned storage rather than piggybacking on session journals. That storage keeps project records, project members, linked channels, and project tasks. Playbooks and Flows also use daemon-owned state-root storage rather than session journals. The Playbook catalog stores immutable manifest versions plus mutable release records; the Flow catalog stores correlation records that point back to normal sessions, runs, agents, tasks, approvals, questions, and evidence. The daemon separately stores the mutable persona records under the state root. This allows restore and audit without requiring a separate transactional database for the core execution path.

Checkpoints and compaction

Long-running sessions can accumulate too much context to replay verbatim. Kheish uses checkpoints and compaction to:
  • summarize older context
  • preserve durable runtime metadata
  • restore active control sections such as tasks, hooks, skills, and MCP instructions after compaction

Restore model

On restart, the daemon rebuilds runtime state from persisted records. This includes:
  • the root session and agent mapping
  • stored checkpoints and metadata
  • persisted session persona bindings
  • persisted session capability scope overrides
  • persisted session credential scope overrides
  • persisted session reply-target defaults
  • persisted channels, public messages, reactions, turn leases, stimuli, and canonical thread-work state
  • persisted projects, project members, linked channels, and project tasks
  • persisted Playbook catalogs and Flow correlation records
  • pending approvals and structured questions
  • active inline skills and control state
  • run scheduler and delivery queue state
  • daemon-managed runtime connectors
The daemon also repairs its compact persona index from on-disk persona records when needed. Existing session persona bindings stay valid because the authoritative binding lives in session metadata, not only in the mutable persona store. The daemon also repairs its compact reply-target cache from session metadata when needed. Session reply-target defaults stay valid because the authoritative copy lives in session metadata, not only in the compact daemon index. Daemon-managed connectors are restored from their own state-root store and then re-resolved against the current secret store. This lets connector definitions survive restart while still picking up rotated secret slots. Recovered run memory is part of this restore story, but it is not stored in the session journal itself. Kheish rebuilds it from daemon-owned run-memory records and a session-scoped daemon index. Read Recovered run memory for the exact model. Channel recovery now also reconciles autonomous public work by consulting both channel-owned state and any referenced ChannelDelivery runs. The daemon can:
  • re-queue claimed stimuli that still need worker attention
  • reuse an already materialized public post instead of duplicating it
  • rebuild the canonical thread-work projection from durable public messages
  • repair or drop stale progress snapshots and work bindings when they no longer match the correct root thread

State root discipline

Operationally, the state root matters as much as the daemon binary. Many apparent bugs are actually state-root mixups, schema drift between binaries, or stale sessions created by an older daemon build. Treat the active state root as part of the deployment identity. The state root also contains daemon-managed connectors, the encrypted secret slots they may reference, broker revocation state for issued auth grants and leases, signed external-action audit ledgers, Playbook catalogs, and Flow correlation records. When you migrate or back up a daemon, keep the session journal, project store, channel store, playbook/flow store, connector store, auth store, and external-action audit state together.

Evidence Note

  • Code verified: crates/kheish-daemon/src/playbooks.rs, crates/kheish-daemon/src/services/playbook.rs, crates/kheish-daemon/src/state/playbook_workflow.rs, crates/kheish-daemon/src/state/persistence.rs.
  • CLI/API verified: Playbook/Flow persistence and recovery are exposed through playbooks get/list and flows get/list.
  • Daemon live tested for this note: no; deterministic restart/API tests cover Flow record recovery.
  • Provider-specific tested for this note: no; state restoration is provider-neutral.