API reference

Memory API

The five endpoints you'll use day-to-day. All are project-scoped — the API key in the Authorization header determines which project's graphs you see.

Authentication

Every call requires a Bearer token. The project's API key is shown once in the create-project modal in the dashboard:

Authorization: Bearer mos_live_xxxxxxxxxxxxxxxxxxxx

Metering

Calls to /add count toward your monthly ingest budget. Calls to /search, /ask, /chat, and /users/:id/context count toward retrieval. See pricing for the per-tier limits. Internal listing/get-by-id endpoints are free.
POST/v1/memory/add

Queue raw content for asynchronous ingestion. Returns immediately with ajobId you can poll. The worker chunks, extracts typed memories, writes the entity graph, and reconciles conflicts.

Request

interface AddRequest {
  content: string;                        // up to 500k chars
  sourceType?: "TEXT" | "PDF" | "URL" | "AUDIO" | "IMAGE" | "MANUAL";
  sourceUrl?: string;                     // optional provenance link
  documentDate?: string;                  // ISO date the content describes
  metadata?: Record<string, unknown>;
  group?: string;                         // target a named group graph
  userId?: string;                        // OR target a user's graph
  // group + userId are mutually exclusive
}

Response (202)

{
  "jobId": "ij_01HXY...",
  "status": "queued",
  "message": "Content queued for ingestion"
}

Curl

curl -X POST "$MEMOS_URL/v1/memory/add" \
  -H "Authorization: Bearer $MEMOS_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "content": "Alice works on billing.", "userId": "user_clerk_abc" }'
GET/v1/memory/job/:jobId

Poll the status of an ingestion job. Typical latency: under three seconds.

{
  "jobId": "ij_01HXY...",
  "status": "pending" | "running" | "done" | "failed",
  "chunksCreated": 1,
  "memoriesCreated": 3,
  "edgesCreated": 2,
  "errorMessage": null,
  "startedAt": "2026-05-17T12:00:00Z",
  "finishedAt": "2026-05-17T12:00:02Z"
}
POST/v1/memory/search

Hybrid search: BM25 over text, pgvector over embeddings, light graph traversal, rerank. Compose across multiple user + group graphs in one call.

Request

interface SearchRequest {
  q: string;                              // natural language
  projectTags: [string, ...string[]];     // dashboard slug(s)
  limit?: number;                         // default 10, max 50
  types?: ("FACT" | "EVENT" | "PREFERENCE" | "DECISION" |
           "STATE" | "RELATIONSHIP" | "BELIEF" | "SKILL" | "GOAL")[];
  timeRange?: { from?: string; to?: string };
  includeGraph?: boolean;                 // include adjacent entities
  groups?: string[];                      // group graphs to include; ["*"] for all
  users?: string[];                       // user graphs to include
}

Response

{
  "results": [
    {
      "id": "mem_...",
      "content": "Alice works on the billing team",
      "type": "FACT",
      "confidence": 0.92,
      "score": 0.91,
      "createdAt": "2026-05-17T12:00:01Z",
      "entities": [ { "name": "Alice", "type": "PERSON" } ]
    }
  ],
  "total": 1,
  "query": "Who works on billing?"
}

Curl

curl -X POST "$MEMOS_URL/v1/memory/search" \
  -H "Authorization: Bearer $MEMOS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "q": "Who works on billing?",
    "projectTags": ["billing-bot"],
    "users": ["user_clerk_abc"],
    "groups": ["policies"]
  }'
POST/v1/memory/ask

Search + LLM synthesis in one call. Returns a cited natural-language answer plus the supporting memories. Use this when you want the agent to speak in the user's voice instead of building the synthesis step yourself.

Request

interface AskRequest {
  question: string;
  projectTag: string;
  limit?: number;                         // default 8, max 20
}

Response

{
  "answer": "Alice is a senior platform engineer on the billing team at Acme.",
  "citations": [
    { "memoryId": "mem_...", "content": "Alice works at Acme on the billing team" }
  ],
  "telemetry": { "input_tokens": 412, "output_tokens": 67, "latency_ms": 920 }
}
POST/v1/memory/chat

Conversational mode. Accepts a messages array, retrieves relevant memories, generates an assistant reply, and auto-ingests the last user turn into memory in parallel. Returns a contextMemories array you can render as citations.

Request

interface ChatRequest {
  messages: { role: "user" | "assistant"; content: string }[];
  projectTag: string;
  ingestUserTurn?: boolean;               // default true
}

Response

{
  "reply": "Based on what we know, Alice...",
  "contextMemories": [ { "id": "mem_...", "content": "..." } ],
  "ingestJobId": "ij_...",                 // null if ingestUserTurn=false
  "telemetry": { "input_tokens": 612, "output_tokens": 134, "latency_ms": 1100 }
}
GET/v1/users/:id/context

Returns a paste-ready system-prompt block summarizing the user — subject summary, top facts, recent episodes, key entities. Drop this into your existing agent's system message instead of building your own retrieval layer.

Query params

  • templatechat (default), agent, or support
  • maxTokens — soft ceiling, default 1500
  • groups — comma-separated group graphs to merge in

Response

{
  "text": "## About this user\n- Alice, senior platform engineer at Acme...\n## Recent activity\n- Asked about invoice CSV export on 2026-05-15...",
  "sections": [
    { "kind": "summary", "title": "About this user", "body": "..." },
    { "kind": "facts",   "title": "Top facts",       "body": "..." },
    { "kind": "episode", "title": "Recent activity", "body": "..." }
  ],
  "estimatedTokens": 1187
}

Errors

Errors return a JSON body shaped like:

{
  "error": "Group \"policies\" not found in this project",
  "code": "GROUP_NOT_FOUND"
}

Common codes:

  • UNAUTHORIZED — missing or invalid API key
  • USER_NOT_FOUND, GROUP_NOT_FOUND, MEMORY_NOT_FOUND
  • QUOTA_EXCEEDED — 429 with retry hint in the body
  • VALIDATION_ERROR — schema mismatch on the request body