83.2% on LoCoMo · 8 points above Zep · methodology published

Your users update facts. Your agent should too.

MemHQ is the memory API that resolves contradictions, tracks what changed, and answers questions like “What was she working on in March?” — in three API calls, with any LLM, in under fifteen minutes.

Start building free 5-min quickstart

$ npm i @memhq/sdk·$ pip install memhq

memhq · livebuilding
$ pip install memhq
from memhq import MemoryClient
knowledge graph
ownslives_inmarried_tou_42Rivian R1SBrooklynSaragraph:user_421 nodes · 0 edges

9memory typesper-user graph

3API callsadd · search · ask

autoconflicts resolvedreconsolidation

Works with

●Claude Code●Codex CLI●Cursor●Windsurf●Claude Desktop●Vercel AI SDK●LangChain●LlamaIndex

See how it works →

the problem

AI that forgets is a business problem.

Most AI products reset at the end of every session. That blank slate is expensive — it shows up as churn, repeated questions, and conversions that never happen. Memory is what separates a demo from a product people rely on.

Users re-explain themselves

Every session starts from zero. People churn when a product feels like it never learns who they are.

Generic, one-size answers

Without context, every reply stays shallow. Personalization is what turns usage into engagement and revenue.

Context dies at every handoff

Support restarts across channels, sessions, and agents. Resolution drags, costs climb, satisfaction drops.

with memhq

Engagement that compounds

Every interaction makes the next one sharper — products people return to because they're remembered.

Resolutions that don't restart

Context follows the customer everywhere. Less time catching up, faster answers, lower support load.

Memory you can govern

See and audit exactly what your AI knows about each user. Built for teams in regulated domains.

the surface

Three calls. Nothing else to wire.

mem.add()

Ingest a fact, chat turn, or document. Async extraction writes typed entities and relations into the user's knowledge graph.

mem.add(
  user_id="u_42",
  content="Drives a Rivian R1S",
)

mem.search()

Hybrid BM25 + vector retrieval fused with RRF. Ranked memories with scores and provenance.

mem.search(
  user_id="u_42",
  q="what does the user drive?",
)

mem.ask()

Server-side retrieve → ground → synthesize. One round trip returns a cited answer, not raw chunks.

mem.ask(
  user_id="u_42",
  q="what cars do I own?",
)

the model loop

Wrap memory around any LLM call.

Two lines bracket your model call: search before to recall, add after to remember. MemHQ never sits in the inference path.

1Recall

mem.search()

Pull this user's relevant memories for the incoming message.

2Ground

system prompt

Inject the memories as context for the model.

3Respond

your LLM

Call OpenAI, Anthropic, or any provider as usual.

4Remember

mem.add()

Persist the new turn so context compounds over time.

import os
from openai import OpenAI
from memhq import MemoryClient

oai = OpenAI()
mem = MemoryClient(api_key=os.environ["MEMHQ_API_KEY"])

def chat(user_id: str, message: str) -> str:
    # 1 · recall what we already know about this user
    hits = mem.search(user_id=user_id, q=message)
    context = "\n".join(m.content for m in hits.memories)

    # 2 · ground the model with that context
    res = oai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"User memory:\n{context}"},
            {"role": "user", "content": message},
        ],
    )
    answer = res.choices[0].message.content

    # 3 · remember the new turn so it compounds
    mem.add(user_id=user_id, content=f"Q: {message}\nA: {answer}")
    return answer

Prefer one round trip?

mem.ask() runs recall → ground → synthesize server-side and returns a cited answer — or a drop-in system prompt.

ctx = mem.ask(user_id=uid, q=msg)

the engine

A knowledge graph, not a vector dump.

Most memory tools store embeddings and hope cosine similarity finds the right chunk. MemHQ extracts typed entities and relations into a per-user graph, then fuses graph traversal with keyword and vector search. The result is recall that understands who, when, and how facts connect — the questions real assistants actually get asked.

Search returns in tens of milliseconds.

Our hybrid retrieval pipeline runs well under the sub-300ms most memory APIs advertise — fast enough to sit inside every prompt without slowing the model down.

See how extraction works

Hybrid retrieval

BM25 keyword + dense vector + graph traversal, fused with reciprocal rank fusion — not similarity search alone.

Answers tied to facts, not guesses

Results trace to a specific entity and relation — not a fuzzy chunk that happened to embed nearby.

Ask what was true last March

Every fact carries valid-from / valid-until. New information supersedes the old — your agent answers correctly even after facts change.

Cited by construction

Every memory and answer carries provenance you can trace back to the source turn.

the platform

Everything memory infrastructure should ship.

MCP server

Drop MemHQ into Claude Code, Codex, Cursor, and Claude Desktop with one config block.

REST + SDKs

Python, TypeScript, and a typed REST API. add · search · ask.

User-scoped graphs

Every end-user gets an isolated memory namespace by default.

Shared / global context

Group graphs for org knowledge, queried alongside user memory in one call.

Storage-layer RBAC

Per-graph ACLs as a first-class SQL table, enforced at the storage layer.

Tamper-evident audit log

Hash-chained record of every mutation. Verify any range, any time.

Facts that know when they changed

Facts carry valid-from / valid-until. Updates supersede — the agent answers correctly even after the user changes their mind.

Private by default

Per-user isolation, encrypted in transit and at rest.

why memhq

Depth you can audit, not just recall.

Plenty of tools retrieve memories. MemHQ also tracks how facts evolve over time, records every conflict it resolves, and enforces access at the storage layer — the parts regulated products actually need.

Capability	Typical memory API	DIY RAG
Per-user knowledge graph	varies
Bi-temporal validity (valid-from/until)	rare
Conflict records + tamper-evident log
Storage-layer RBAC (per-graph ACLs)		build it
Cited answer in one call (ask)	search only	build a pipeline
MCP server for coding agents	some
Open-source SDKs (Apache-2.0)	varies

use cases

Built for memory-hungry products.

One memory layer, many surfaces. If it talks to a user more than once, it gets better with MemHQ.

Chatbots & assistants

Persistent context across sessions — users never reintroduce themselves.

Autonomous agents

Agents that learn from every run, not just within a single trace.

Customer support

History that survives handoffs across channels, sessions, and agents.

RAG, without the pipeline

Skip the orchestration — one ask() returns cited, grounded answers.

AI companions

Relationships that deepen over time instead of resetting each chat.

Coding agents

Recall team conventions, stack choices, and past decisions per repo.

model context protocol

Give your coding agent a memory.

One MCP server, every assistant. Persist context across sessions in Claude Code, Codex, Cursor, and anything that speaks MCP.

Claude CodeCodex CLICursorClaude DesktopWindsurf

$ npx -y @memhq/mcp-server

~/.cursor/mcp.json

{
  "mcpServers": {
    "memhq": {
      "command": "npx",
      "args": ["-y", "@memhq/mcp-server"],
      "env": { "MEMHQ_API_KEY": "sk_live_…" }
    }
  }
}

what you build on it

Memory, applied.

Support agents

Context that survives every handoff.

Customer history persists across sessions, channels, and agents. No more 'can you start from the beginning?'

# before replying, pull the user's grounded context
ctx = mem.ask(
    user_id=ticket.customer_id,
    q="summarize this customer's open issues and prefs",
)
reply = llm.generate(system=ctx.answer, messages=thread)

Coding agents

No more permanent first day.

Agents learn team conventions and architecture decisions, then anticipate intent instead of re-asking.

// recall project conventions for this repo's user
const { memories } = await mem.search({
  userId: developer.id,
  q: "lint rules, stack choices, naming conventions",
});
agent.run({ context: memories, task });

Companions

Relationships that deepen, not reset.

Emotional arcs, shared history, and preferences compound. Every conversation builds on the last.

mem.add(user_id=user.id, content=transcript)
profile = mem.ask(
    user_id=user.id,
    q="what matters most to this person right now?",
)

drop into your stack

One key. Any framework.

import os
from memhq import MemoryClient

mem = MemoryClient(api_key=os.environ["MEMHQ_API_KEY"])
mem.add(user_id="u_42", content="Drives a Rivian R1S")
print(mem.ask(user_id="u_42", q="What cars do I own?").answer)
# → "You own a Rivian R1S." [m_72ad]

the fine print

Questions, answered plainly.

Does MemHQ sit in my inference path?

No. You call search before your model call and add after it — MemHQ never proxies your LLM traffic. If you want one round trip, ask runs retrieve → ground → synthesize server-side and returns a cited answer.

How fast does ingestion show up?

add() returns immediately. Extraction runs async — entities, relations, and temporal grounding are typically written to the graph within a few seconds.

Where does my users' data live?

Every end-user gets an isolated graph namespace by default, encrypted in transit and at rest. Shared group graphs are opt-in and governed by per-graph ACLs enforced at the storage layer.

What actually gets extracted?

Atomic, pronoun-resolved facts typed across nine memory types (facts, events, preferences, decisions, and more), plus the entities and relations connecting them — with bi-temporal validity so updates supersede instead of overwrite.

Can I audit what changed?

Yes. Every mutation lands in a tamper-evident, hash-chained audit log, and every automatic conflict resolution is recorded with the winning and superseded memory — built for teams in regulated domains.

start building

Ship an agent that remembers.

Free tier, no card. Three endpoints between you and persistent memory.

Get started free Read the docs

Apache-2.0 SDKs · SOC 2 in progress · api.memhq.ai