Debugging and recovery
When a run looks wrong, debug from daemon evidence rather than from assumptions about the prompt.Evidence-first order
The recommended inspection order is:flows get <flow_id>when the issue is Playbook/Flow-scopedruns get <run_id>approvals list --session-id <session_id>tasks list <session_id>tasks output <session_id> <task_id>sessions get <session_id>sessions events <session_id>channels get <channel_id>andchannels messages list <channel_id>when debugging one public conversationchannels leases <channel_id>when a thread looks stalled or over-eagerchannels stimuli list <channel_id>andchannels thread-work <channel_id>when autonomous wake-ups or recovery look wrongruns debug <run_id>
Common operational pitfalls
The most common causes of confusion are:- wrong daemon URL
- wrong state root
- stale sessions created by an older daemon build
- CLI and daemon schema drift
- shell-heavy runs that need more than one approval wave
- updating a persona record and expecting an already-bound session to change automatically
- assuming the daemon-global MCP or skill inventory is always fully visible to every session
- assuming visible tools imply delegated credential access, even when the session or child
credential_scopehas narrowed route, connector, or MCP auth
Restore expectations
Kheish can restore pending approvals, pending structured questions, active inline skills, schedules, and queued deliveries. Recovery is only as good as the state root and binary pair you are actually running. Kheish does not promise that an arbitrary shell process survives a daemon restart. Active daemon-managed shell tasks are settled fail-closed on boot:- inspect
tasks list <session_id>forstatus: "failed" - inspect
tasks get <session_id> <task_id>formetadata.terminal_reason: "daemon_restarted"andmetadata.recovered_on_boot: true - inspect
tasks output <session_id> <task_id> --fullfor partial output - retry manually only after checking whether the command is safe to repeat
task_output.retrieval_status=success as shell success. It only means the daemon read the persisted output view.
Session metadata is also the authoritative restore source for:
- the bound persona snapshot
- the persisted session capability scope override
- the persisted session credential scope override
- the persisted session reply-target defaults
- the effective active inline skill state derived from persona defaults and session-local changes
- channel records
- public messages and reactions
- public turn leases
- queued channel stimuli
- canonical thread-work state with bindings and progress snapshots
ChannelDelivery run is still active or has already settled.
Debug capture
Use debug capture when you need to inspect:- the final system prompt sent to the model
- the effective tool surface
- provider wire payloads
- normalized model outputs
./target/debug/kheish-daemon sessions get <session_id>for the session’s bound persona summary./target/debug/kheish-daemon personas get <persona_id>for the latest mutable persona record
./target/debug/kheish-daemon sessions get <session_id>forcapability_scopeandeffective_capability_scope./target/debug/kheish-daemon sessions get <session_id>forcredential_scopeandeffective_credential_scope./target/debug/kheish-daemon runtime getfor the daemon-global inventory that existed before session filtering
./target/debug/kheish-daemon runtime auth subject <subject_id>for active route and connector leases./target/debug/kheish-daemon runtime auth lease <lease_id>for one concrete delegated lease./target/debug/kheish-daemon runs external-actions <run_id>for the signed audit trail that ties one run to principals, grants, targets, and request or response digests
audit-signing.key together with the external-action ledgers. Without the matching key, existing signed audit records remain readable on disk but the daemon cannot continue that ledger safely.
When output routing looks wrong, inspect:
./target/debug/kheish-daemon sessions get <session_id>for persistedreply_targets./target/debug/kheish-daemon connectors listfor daemon-managed transport definitions./target/debug/kheish-daemon connectors get <kind> <name>for one connector’s redacted secret and routing config
./target/debug/kheish-daemon channels get <channel_id>for title, members, autonomy policy, and paused state./target/debug/kheish-daemon channels messages list <channel_id>for the durable public timeline./target/debug/kheish-daemon channels leases <channel_id>for the current public turn holder and queued candidates./target/debug/kheish-daemon channels stimuli list <channel_id>for pending, claimed, superseded, cancelled, or already dispatched wake-ups./target/debug/kheish-daemon channels thread-work <channel_id>for the canonical root-thread work projection, bindings, and progress snapshots./target/debug/kheish-daemon runs list --session-id <session_id>andruns get <run_id>when one lease still points at an active or recently settledChannelDeliveryrun
ChannelDelivery runs before a lease is cleared or advanced.
Current channel recovery also does more than just replay leases:
- claimed stimuli can be re-queued instead of being lost
- already materialized public posts can be reused instead of duplicated
- canonical thread-work state can be rebuilt from durable public messages
- stale progress snapshots and stale bindings can be repaired or dropped when they no longer match the correct root thread
Evidence Note
- Code verified:
crates/kheish-daemon/src/services/run.rs,crates/kheish-daemon/src/services/task.rs,crates/kheish-daemon/src/state/task_workflow.rs,crates/kheish-daemon/src/state/playbook_workflow.rs. - CLI/API verified: commands named in the evidence-first order exist in
crates/kheish-daemon/src/main.rsand route throughcrates/kheish-daemon/src/api/handlers.rs. - Daemon live tested for this note: no; deterministic daemon restart tests cover the documented shell-task failure state.
- Provider-specific tested for this note: no; restart recovery is daemon-local and provider-neutral.
