Capture and observations
Kheish models capture as a generic daemon-owned observation flow:- one external producer uploads data into one daemon-owned observation source
- the daemon stores one durable observation record plus raw asset metadata
- later, one operator or schedule materializes observations into a normal run
Why capture is not a session input shortcut
Session input and capture solve different problems:- session input is for immediate user- or connector-submitted work
- observations are for durable external state that may be processed later
Core objects
Observation source
An observation source is one stable daemon-owned ingest boundary. Each source carries:source_idkindsensitivityretention_secondsmax_active_observationsmax_active_bytesingest_rate_limit_window_msingest_rate_limit_burstpurge_raw_on_retentionallow_materializationallow_output_delivery
screen_snapshotwebcam_snapshotmicrophone_segment
Capture agent
A capture agent is one host-local producer provisioned by the daemon. Each durable agent record carries:machine_idbatch_idos_profile- owned
source_ids - source token leases with issue, expiry, revocation, and supersession metadata
last_heartbeat_at_msheartbeat_deadline_msheartbeat_statestatus
missing when its heartbeat deadline passes; GET /v1/capture-alerts exposes the alert, and a later valid heartbeat records recovery.
Observation
Each uploaded observation stores:- one daemon-owned raw
asset_id media_typesha256byte_lengthcaptured_at_msreceived_at_ms- optional
canonical_text_asset_id - optional
stream_id - optional
seq_no - caller-supplied
metadata - one stable
idempotency_keyplus daemon request fingerprint
Materialization
Materialization converts an observation selection into a normal daemon run. The daemon currently supports these selection shapes:- explicit
observation_ids latest_from_sourcelatest_from_streamobservation_groupbycapture_group_id
Accepted media types
Current source-kind media acceptance is strict:screen_snapshot:image/png,image/jpegwebcam_snapshot:image/png,image/jpegmicrophone_segment:audio/wav,audio/webm
audio/mpeg, audio/mp4, and audio/m4a. That wider media support does not change the stricter observation-source ingest contract.
Current daemon-side behavior
The daemon already provides:observations sources create|list|getobservations ingestobservations list|getobservations materializeobservations schedule
source_id + stream_id filtering. Stream identifiers are intentionally scoped to one source rather than treated as a global lookup key.
The HTTP surface is exposed under:
POST /v1/observation-sourcesGET /v1/observation-sourcesGET /v1/observation-sources/{source_id}POST /v1/observation-sources/{source_id}/observationsGET /v1/observationsGET /v1/observations/{observation_id}POST /v1/observation-materializationsPOST /v1/capture-agent-provisionsGET /v1/capture-agentsGET /v1/capture-agents/{machine_id}POST /v1/capture-agents/{machine_id}/heartbeatPOST /v1/capture-agents/{machine_id}/revokeGET /v1/capture-alerts
/v1/schedules surface with one embedded observation materialization request.
Capture runtimes
The daemon does not open microphones, webcams, or screens itself. Capture happens in an external producer that speaks the observation ingest contract. The current host-local reference runtime iskheish-capture, developed as a sibling repository to this daemon workspace.
Its current implemented scope is:
- fixture-based uploads for
image/png,image/jpeg,audio/wav, andaudio/webm - live screen capture to
image/pngorimage/jpeg - live webcam capture to
image/pngorimage/jpeg - live microphone capture to
audio/wav - mixed-audio capture that can emit:
- one mixed WAV only
- one raw WAV leg for each input that produced signal
- raw WAV legs plus one mixed WAV artifact
canonical_textvisual_preview
canonical_text is now one real daemon-owned speech-to-text path when a transcription backend is configured. The same canonical-text behavior is reused across:
- audio observations
- audio assets referenced by normal session input
- connector-delivered audio that is first normalized into daemon-owned assets
Correlated audio today
Kheish can already store correlated audio artifacts, but the current daemon materialization model is still source-centric. The current reliable pattern for correlated audio is:- keep one daemon
source_id - use distinct
stream_idvalues for each uploaded leg or derived artifact - keep one shared group identifier in observation metadata when needed
- use
latest_from_streamwhen one stream-local selection is sufficient - use
observation_groupwhen onecapture_group_idshould select recent related artifacts across sources - use explicit
observation_idswhen you need a precise hand-picked selection
stream_id and seq_no, and it can now list, materialize, and schedule by source_id + stream_id or by an explicit capture_group_id. It does not perform arbitrary metadata queries or infer capture-session structure beyond the supported group field.
Those fields are selection and filtering aids today. The default materialization context surfaces selected stream and group metadata, but arbitrary caller metadata is still treated as untrusted observed context.
Materialization defaults
Materialization is intentionally conservative:- image-like sources can include their raw asset by default
- microphone observations remain transcript-first by default
- operators can opt into raw microphone assets during materialization with
raw_asset_policy = always - when canonical text exists for a microphone observation, the daemon materializes that text
- when canonical text does not exist for a microphone observation, the daemon inserts one placeholder text notice; with
raw_asset_policy = always, that notice is attached alongside the raw audio asset - when a transcription backend is configured, raw
audio/wavandaudio/webmobservations can derivecanonical_textdaemon-side and persist it back onto the observation before materialization - the same daemon transcription service also handles supported audio assets outside the observation ingest path, such as imported or generated MP3 and M4A files
- without uploader-supplied canonical text and without a configured transcription backend, microphone materialization falls back to the generic placeholder notice
- source sensitivity and
allow_output_deliverystill constrain reply routing
observation_ids are currently the best way to control exactly what the model receives.
Operator guidance
Use observations when:- you want capture data to survive client disconnects and daemon restarts
- capture is produced outside the daemon process
- one workflow may analyze the same capture more than once
- one schedule should trigger future analysis from retained data
- the caller is sending one immediate instruction plus files
- no separate retention or capture lifecycle is needed
