Ingestion pipeline
The seven phases between calling /v1/memhq/add and a memory being queryable.
Ingestion pipeline
When you call /v1/memhq/add, the API responds immediately with an
episode_id and queues the payload for the ingestion worker. The
worker runs seven phases to turn raw input into queryable memory.
Phase 1 — Episode persistence
The raw payload is written verbatim to the episode log. This is the source of truth: even if downstream extraction is buggy, the raw bytes are always recoverable. Episodes are immutable.
Phase 2 — Document-date anchoring
If your payload includes message timestamps, MemHQ anchors a document date for every message. This anchor is what lets the extractor resolve relative time phrases ("last summer", "two weeks ago") into concrete dates downstream.
Phase 3 — Extraction
A purpose-tuned LLM extractor reads the episode and emits structured memory candidates: claim text, entities mentioned, the source span, and confidence. The extractor is prompted to capture facts, preferences, decisions, and relationships — not narration.
Phase 4 — Entity resolution
Each entity mention is resolved against the project's existing entity table (matched on canonical name, aliases, and embedding similarity). New entities are minted; existing ones are linked. This is what makes later multi-hop queries work — the second time you mention "Acme", it points at the same node.
Phase 5 — Reconciliation
The extracted candidates are compared against existing memories on the same graph. The reconciler can:
- Accept — the candidate is novel; add it.
- Supersede — the candidate contradicts an existing memory; mark the old one inactive and link forward to the new one.
- Reinforce — the candidate restates an existing memory; bump its confidence and last-seen timestamp.
- Reject — the candidate is a duplicate or low-confidence noise.
See Reconciliation for the full semantics.
Phase 6 — Indexing
Accepted memories are embedded and indexed in both a vector store (for semantic search) and a lexical store (for keyword/BM25 search). Search uses a hybrid retriever that fuses both with reciprocal-rank fusion.
Phase 7 — Notification
The worker writes a completion record to the episode and (optionally) emits a webhook event so downstream consumers can react to "new memories landed for user X". See Webhooks.
Latency
End-to-end ingestion typically completes in 2-6 seconds for a
single-message episode, dominated by the extraction and reconciliation
LLM calls. The API returns from /add in < 100 ms — the worker
runs asynchronously.
If you need read-your-writes consistency (rare; usually only in
evaluation), pass wait_for_processing: true in the add call. The
request will hold open until phases 1–6 complete.