Memory MCP¶

memory-mcp is a knowledge-graph MCP server backed by Postgres + pgvector (vchord HNSW). It exposes entity / observation / relation tools to every MCP client in the cluster, so Claude Code sessions and Open WebUI tool callers share the same long-term memory substrate. (The langgraph-agents fleet was a third consumer, writing directly via MCPMemoryStore — removed 2026-07-06 along with the rest of the fleet; see Architecture and Provenance below for what that leaves behind.)


Source repo	https://github.com/rwlove/memory-mcp
Image	`ghcr.io/rwlove/memory-mcp` (digest-pinned in the HelmRelease)
Manifests	`kubernetes/apps/mcp-system/memory-mcp/`
Backend	`postgres-langgraph-memory` CNPG cluster, schema `kg`
Tool prefix at the gateway	`memory_`

Architecture¶

              ┌─ Claude Code (laptop) ──┐
              │  + markdown auto-memory │  (per-project, parallel)
              │  + memory_* MCP tools   │
              └──────────┬──────────────┘
                         │
Open WebUI ──────────────┤
                         │
                         ▼
            ┌──────────────────────────────┐
            │  mcp-gateway (Envoy + Istio) │
            └──────────────┬───────────────┘
                           │ memory_* (toolPrefix)
                           ▼
            ┌──────────────────────────────┐
            │  memory-mcp  (FastMCP)       │
            │  streamable-http :8070       │
            └──────────────┬───────────────┘
                           │ psycopg / asyncpg
                           ▼
            ┌──────────────────────────────┐
            │ postgres-langgraph-memory    │
            │ pgvector + vchordrq HNSW     │
            │ database: langgraph_memory   │
            │ schema:   kg                 │
            └──────────────────────────────┘

Removed 2026-07-06: langgraph-agents used to bypass the MCP transport, writing direct SQL to the same kg.* schema via MCPMemoryStore(BaseStore) (that arrow is gone from the diagram above). The rest of the fleet was decommissioned in the same pass. Rows it wrote remain in place under the langgraph/* namespace (see Namespace conventions below) — nothing is deleted, there's just no active writer on that path anymore.

Schema¶

Three tables under the kg schema:

kg.entities      (id, name UNIQUE, type, namespace, source JSONB,
                  created_at, updated_at)
kg.observations  (id, entity_id, content, embedding vector(768),
                  source JSONB, created_at, deleted_at)
kg.relations     (id, from_entity, to_entity, type, source JSONB,
                  created_at, UNIQUE (from_entity, to_entity, type))

Indexes: kg_obs_embedding_hnsw using vchordrq (embedding vector_cosine_ops) for semantic search; per-column indexes on entity type + namespace.

Embedding model is nomic-embed-text (768 dimensions). The model must be resident on Ollama — see Ollama /api/embed reference for the pull procedure (POST /api/pull with {"model": "nomic-embed-text", "stream": false}).

Soft delete only. memory_delete_observation sets deleted_at = now(), never DROPs. The principle is "don't silently delete memories across namespaces" — observation history stays inspectable even after a logical delete.

Tool surface (12 tools, prefix `memory_`)¶

Tool	Purpose
`create_entity`	Create node + initial observations
`add_observation`	Append observation to existing entity
`get_entity`	Fetch by name + (optionally) one-hop relations
`list_entities`	Enumerate, filterable by type / namespace
`update_entity`	Rename / retype / re-namespace; relations survive via id-based join
`update_observation`	Replace content + re-embed in place
`delete_observation`	Soft delete (`deleted_at`)
`create_relation`	Typed directed edge between two entities
`link`	Sugar for `create_relation` (default type `related_to`)
`search`	Hybrid (semantic via vchord cosine, keyword via ILIKE)
`graph_walk`	BFS to depth N (cap 4) from a start entity
`recent`	Most-recent observations, filterable by agent / since / entity
`create_entities`	Transactional bulk-create
`add_observations`	Transactional bulk-append

Namespace conventions¶

Two distinct namespacings live in this schema:

entities.namespace — caller-supplied free-form grouping (e.g. infra, cluster, software, people). Filter via memory_list_entities(namespace_filter=...) or memory_search(namespace_filter=...).
langgraph/* prefix — entities created by MCPMemoryStore (the LangGraph BaseStore adapter) all live under langgraph/{ns0}/{ns1}/..., where the LangGraph namespace: tuple[str, ...] becomes the path. Their type is always langgraph-store-item. This kept fleet writes visually separated from human-seeded entries (host/cnpg/service/etc.) when other agents query the graph. Historical namespace — langgraph-agents (the only writer of this prefix) was removed 2026-07-06; existing langgraph/* entities are still queryable, but the prefix no longer receives new writes.

Provenance¶

Every write carries a source JSONB. Server stamps at: ISO8601 if the caller didn't. Conventions:

Caller	Minimum source
Claude Code	`{"agent": "claude-code", "claude_namespace": "<encoded-cwd>"}`
LangGraph fleet (via MCPMemoryStore) — removed 2026-07-06, historical rows only	`{"agent": "langgraph", "store": "MCPMemoryStore"}` (was set automatically)
Open WebUI tool calls	`{"agent": "open-webui"}`

A memory_recent(agent_filter="claude-code") query shows just one agent's recent contributions.

Observability¶

/metrics exposes Prometheus scrape data:

Process / GC collectors (CPU, RSS, fd count, GC pauses) — prometheus_client defaults.
memory_mcp_tool_calls_total{tool, status} — counter per tool, status ∈ {success, error}. Tool-level negative responses (entity not found etc.) count as success.
memory_mcp_tool_call_duration_seconds{tool} — histogram, buckets 0.01..30s.
memory_mcp_embed_calls_total{status} — embedding requests, status ∈ {success, empty, error}.
memory_mcp_embed_call_duration_seconds — embed latency histogram.

A ServiceMonitor in mcp-system scrapes /metrics every 30s. This is currently the only instrumented MCP backend in the cluster — see project_todo_mcp_layer_observability_gap for fleet-wide rollout context.

Probes:

/healthz — process up + Postgres SELECT 1. Used by liveness + startup probes.
/readyz — /healthz plus Ollama /api/tags reachable. Used by readiness probe; returns 503 if Ollama is degraded, dropping the pod from the Service rotation until embeddings work again.

Schema migrations¶

The server does not apply schema. A one-shot Job (memory-mcp-schema-init) under the same Kustomization runs psql -f /sql/schema.sql on every Flux reconcile. All DDL is idempotent (CREATE IF NOT EXISTS), so re-runs are no-ops.

Job gotcha: Kubernetes Job.spec.template is immutable. Editing the command field (e.g. the diagnostic echo) without bumping the schema-init annotation or deleting the existing Job will deadlock Flux with field is immutable. Recovery: kubectl delete job memory-mcp-schema-init -n mcp-system and let Flux recreate.

Troubleshooting¶

Symptom	First check
Semantic search returns nothing	`kubectl exec memory-mcp-... -- curl http://ollama.ai.svc:11434/api/embed -d '{"model":"nomic-embed-text","input":"x"}'` — 404 = model not pulled
Pod `CrashLoopBackOff` on startup	Pod logs for `kg.<table> missing` — schema-init Job hasn't completed yet (or failed)
Flux Kustomization stuck on `field is immutable`	Job-template change without delete; see Schema migrations above
Embed counter `error` spiking	Ollama unreachable or model unloaded; `/readyz` will already be 503
Search returns 0 for a query that obviously matches	Run with `mode=keyword` to isolate. If keyword finds it but hybrid doesn't, semantic is broken (Ollama).