Capture and observations
Kheish models capture as a generic daemon-owned observation flow:- one external producer uploads data into one daemon-owned observation source
- the daemon stores one durable observation record plus raw asset metadata
- later, one operator or schedule materializes observations into a normal run
Why capture is not a session input shortcut
Session input and capture solve different problems:- session input is for immediate user- or connector-submitted work
- observations are for durable external state that may be processed later
Core objects
Observation source
An observation source is one stable daemon-owned ingest boundary. Each source carries:source_idkindsensitivityretention_secondsmax_active_observationsmax_active_bytesallow_materializationallow_output_delivery
screen_snapshotwebcam_snapshotmicrophone_segment
Observation
Each uploaded observation stores:- one daemon-owned raw
asset_id media_typesha256byte_lengthcaptured_at_msreceived_at_ms- optional
canonical_text_asset_id - optional
stream_id - optional
seq_no - caller-supplied
metadata - one stable
idempotency_keyplus daemon request fingerprint
Materialization
Materialization converts an observation selection into a normal daemon run. The daemon currently supports two selection shapes:- explicit
observation_ids latest_from_sourcelatest_from_stream
Accepted media types
Current source-kind media acceptance is strict:screen_snapshot:image/png,image/jpegwebcam_snapshot:image/png,image/jpegmicrophone_segment:audio/wav,audio/webm
audio/mpeg, audio/mp4, and audio/m4a. That wider media support does not change the stricter observation-source ingest contract.
Current daemon-side behavior
The daemon already provides:observations sources create|list|getobservations ingestobservations list|getobservations materializeobservations schedule
source_id + stream_id filtering. Stream identifiers are intentionally scoped to one source rather than treated as a global lookup key.
The HTTP surface is exposed under:
POST /v1/observation-sourcesGET /v1/observation-sourcesGET /v1/observation-sources/{source_id}POST /v1/observation-sources/{source_id}/observationsGET /v1/observationsGET /v1/observations/{observation_id}POST /v1/observation-materializations
/v1/schedules surface with one embedded observation materialization request.
Capture runtimes
The daemon does not open microphones, webcams, or screens itself. Capture happens in an external producer that speaks the observation ingest contract. The current host-local reference runtime iskheish-capture, developed as a sibling repository to this daemon workspace.
Its current implemented scope is:
- fixture-based uploads for
image/png,image/jpeg,audio/wav, andaudio/webm - live screen capture to
image/pngorimage/jpeg - live webcam capture to
image/pngorimage/jpeg - live microphone capture to
audio/wav - mixed-audio capture that can emit:
- one mixed WAV only
- one raw WAV leg for each input that produced signal
- raw WAV legs plus one mixed WAV artifact
canonical_textvisual_preview
canonical_text is now one real daemon-owned speech-to-text path when a transcription backend is configured. The same canonical-text behavior is reused across:
- audio observations
- audio assets referenced by normal session input
- connector-delivered audio that is first normalized into daemon-owned assets
Correlated audio today
Kheish can already store correlated audio artifacts, but the current daemon materialization model is still source-centric. The current reliable pattern for correlated audio is:- keep one daemon
source_id - use distinct
stream_idvalues for each uploaded leg or derived artifact - keep one shared group identifier in observation metadata when needed
- use
latest_from_streamwhen one stream-local selection is sufficient - have one external client materialize with explicit
observation_idswhen you need a precise grouped selection across multiple related artifacts
stream_id and seq_no, and it can now list, materialize, and schedule by source_id + stream_id. It does not yet perform arbitrary metadata grouping or multi-stream capture-session reconstruction on its own. Documentation and integrations should therefore avoid promising automatic grouped call reconstruction on the daemon side.
Those fields are selection and filtering aids today. The default materialization context does not automatically turn stream_id, seq_no, or arbitrary caller metadata into grouped prompt-visible semantics.
Materialization defaults
Materialization is intentionally conservative:- image-like sources can include their raw asset by default
- microphone observations remain transcript-first by default
- operators can opt into raw microphone assets during materialization with
raw_asset_policy = always - when canonical text exists for a microphone observation, the daemon materializes that text
- when canonical text does not exist for a microphone observation, the daemon inserts one placeholder text notice; with
raw_asset_policy = always, that notice is attached alongside the raw audio asset - when a transcription backend is configured, raw
audio/wavandaudio/webmobservations can derivecanonical_textdaemon-side and persist it back onto the observation before materialization - the same daemon transcription service also handles supported audio assets outside the observation ingest path, such as imported or generated MP3 and M4A files
- without uploader-supplied canonical text and without a configured transcription backend, microphone materialization falls back to the generic placeholder notice
- source sensitivity and
allow_output_deliverystill constrain reply routing
observation_ids are currently the best way to control exactly what the model receives.
Operator guidance
Use observations when:- you want capture data to survive client disconnects and daemon restarts
- capture is produced outside the daemon process
- one workflow may analyze the same capture more than once
- one schedule should trigger future analysis from retained data
- the caller is sending one immediate instruction plus files
- no separate retention or capture lifecycle is needed
