mm.tech
memory··6 min

Memory architectures compared

Local SQLite + FTS5, knowledge graph, vector embeddings, hybrid. Which fits which agent shape?

I have built three memory systems in the past year. Each one for a different agent shape. Each one made different trade-offs. Here is what I learned.

System one was pure vector embeddings (pgvector + Supabase). Every memory chunk got an OpenAI embedding, similarity search returned the top-k. It felt smart. It returned irrelevant context constantly. The problem: vector similarity is a fuzzy "vibe" match. If the user asks "what was the bug we fixed in the auth flow last week", the top-k will be authentication topics in general, not the specific bug. Vector retrieval is great for semantic neighborhoods, terrible for specific recall.

System two was full-text search (SQLite FTS5). No embeddings, just BM25 ranking on the words in the memories. Faster, no API calls, no embedding bill. For specific recall it crushed vector search. For semantic queries (you don't remember the exact word but you know the concept) it whiffed.

System three is hybrid: FTS5 first pass (fast, specific, free), then a knowledge graph for entity-based recall (people, projects, decisions, learnings as nodes with edges). Vector embeddings only as the third tier when the first two return nothing relevant. Cost dropped 90%, recall quality went up. This is what local-memory-mcp ships and what studiomeyer-memory uses for cloud.

The knowledge graph is the unsung hero. The agent doesn't really need vector similarity for most queries. It needs to know "this entity has been mentioned, here are its observations, here are its relationships". A graph answers that in one query. Vector search answers it in a soft probabilistic way that requires the agent to filter the noise.

When you should use vectors: your data is mostly unstructured prose, you have no entities to extract, the queries are conceptual. When you should not: you can extract entities, you have specific terminology, you need precise recall.

Honest caveat: the best validated LongMemEval run for our hybrid memory sits at 86% on 50q stratified (GPT-4o judge, Anthropic Sonnet 4.6 as answer generator, Run S957b on 1 May 2026). That run was against memory-server v3.16.10, before the v3.16.11 cross-project search-leak fix landed on 2 May 2026, so the 86% may be 1-3 percentage points inflated; the realistic range after a clean re-run is 78-86%. For comparison: Mem0 Managed Platform jumped to 93.4% in April 2026 (Token-Efficient Memory Algorithm) but collapses to 48.6% aggregate / 16.3% temporal on BEAM-10M production-scale. The benchmark page on studiomeyer.io tracks methodology and the run logs.

Pattern I keep seeing: builders pick vectors first because the marketing says vectors. Then they get bad recall and blame the embedding model. The embedding model is fine. The architecture is wrong. Try FTS5 first. Add a graph if you have entities. Add vectors last.