Concepts

Ingestion pipeline

The seven phases between calling /v1/memhq/add and a memory being queryable.

Ingestion pipeline

When you call /v1/memhq/add, the API responds immediately with an episode_id and queues the payload for the ingestion worker. The worker runs seven phases to turn raw input into queryable memory.

Phase 1 — Episode persistence

The raw payload is written verbatim to the episode log. This is the source of truth: even if downstream extraction is buggy, the raw bytes are always recoverable. Episodes are immutable.

Phase 2 — Document-date anchoring

If your payload includes message timestamps, MemHQ anchors a document date for every message. This anchor is what lets the extractor resolve relative time phrases ("last summer", "two weeks ago") into concrete dates downstream.

Phase 3 — Extraction

A purpose-tuned LLM extractor reads the episode and emits structured memory candidates: claim text, entities mentioned, the source span, and confidence. The extractor is prompted to capture facts, preferences, decisions, and relationships — not narration.

Phase 4 — Entity resolution

Each entity mention is resolved against the project's existing entity table (matched on canonical name, aliases, and embedding similarity). New entities are minted; existing ones are linked. This is what makes later multi-hop queries work — the second time you mention "Acme", it points at the same node.

Phase 5 — Reconciliation

The extracted candidates are compared against existing memories on the same graph. The reconciler can:

  • Accept — the candidate is novel; add it.
  • Supersede — the candidate contradicts an existing memory; mark the old one inactive and link forward to the new one.
  • Reinforce — the candidate restates an existing memory; bump its confidence and last-seen timestamp.
  • Reject — the candidate is a duplicate or low-confidence noise.

See Reconciliation for the full semantics.

Phase 6 — Indexing

Accepted memories are embedded and indexed in both a vector store (for semantic search) and a lexical store (for keyword/BM25 search). Search uses a hybrid retriever that fuses both with reciprocal-rank fusion.

Phase 7 — Notification

The worker writes a completion record to the episode and (optionally) emits a webhook event so downstream consumers can react to "new memories landed for user X". See Webhooks.

Latency

End-to-end ingestion typically completes in 2-6 seconds for a single-message episode, dominated by the extraction and reconciliation LLM calls. The API returns from /add in < 100 ms — the worker runs asynchronously.

If you need read-your-writes consistency (rare; usually only in evaluation), pass wait_for_processing: true in the add call. The request will hold open until phases 1–6 complete.