Give your agent reliable memory

Ch. 01 What it is

An agent that forgets between sessions reruns the same dead ends and contradicts its own past decisions. The fix isn't a bigger context window — it's a written record the agent reads before it acts and updates when it's done. Three ways to hold that record, simplest first.

Ch. 02 The three ways to build it

Simplest path first. Every tier carries its real setup time and its honest trade-off — the cost is the part most write-ups leave out.

1
Tier 1 · simplest path
Plan-file + resumable checklist
Setup~10 min
- plain markdown
One markdown file per task. At the top, the goal and the decisions you've already locked. Below it, a checklist the agent updates as it goes — `[x]` for done, a one-line note where it had to choose. The agent reads the file before it starts and writes back before it stops. That's the whole mechanism. When a session drops or you come back tomorrow, the agent reads the same file and picks up where the checklist left off, decisions intact.
The trade-off
Low-tech — but on small projects it outperforms most tools. There's nothing to index, nothing to retrieve wrong, and you can read the entire memory at a glance. The ceiling is real: one file stops scaling once the work spans many tasks at once.
2
Tier 2
File-memory vault + manual capture
Setup~1 hr
- Obsidian
- markdown
Promote the single plan-file into a small vault — one note per fact, decision, or recurring pattern, cross-linked. A short index note points at the rest. The agent searches the vault before acting and you (or it) write a new note whenever something is worth keeping. Because every note is plain markdown, you own it outright, you can read it without the agent, and nothing is locked inside a proprietary store.
The trade-off
Durable, but it asks for discipline — you have to actually capture. A vault only knows what was written down; the day you skip the note is the day the agent doesn't have the fact. The upkeep is the price of memory you can trust and audit by hand.
3
Tier 3
Auto-capture + inject (or a graph)
Setup~half day
- claude-mem
- a memory MCP
Hand the capture step to a tool. An auto-memory layer (claude-mem, or a memory MCP server) watches the work, writes notes without being asked, and injects the relevant ones back into context at the start of each session. At the far end sits a knowledge graph — entities and the relationships between them — for memory you query instead of read. This earns its complexity only when the work is genuinely large and moving fast, and it's the one tier where the agent never waits on a human to remember.
The trade-off
Graphs decay fast and cost scales — worth it only on large, fast-moving work. Auto-captured notes drift stale, retrieval surfaces the wrong fact at the wrong time, and the bill grows with every entity. Don't reach for this until Tiers 1 and 2 are visibly straining; most projects never get there.

Ch. 03 The detail

An agent that forgets between sessions reruns the same dead ends and contradicts its own past decisions. The fix isn't a bigger context window — it's a written record the agent reads before it acts and updates when it's done. Three ways to hold that record, simplest first.

Category: Agents & workflows
Format: System
Level: intermediate
Provenance: Own-packaged

agentsmemorycontextclaude-codeobsidian

The problem, stated plainly

A language model holds nothing between sessions. Close the window, and the agent that spent an hour learning your project starts the next hour as a stranger — relitigating settled decisions, retrying the path that already failed, asking again for the constraint you gave it yesterday. People reach for a bigger context window to fix this. A bigger window is a longer short-term memory, not a memory at all. The moment the session ends, it’s gone.

Reliable memory is a different shape. It’s a written record that lives outside the model: the agent reads it before it acts, and updates it when it’s done. Read-before, write-after. Everything below is a way to hold that record — they differ only in who does the writing and how the agent gets it back.

Start at the bottom of the ladder

The instinct is to start with the graph database, because it sounds like the serious answer. On a small project it’s the wrong answer — you spend the afternoon wiring retrieval that surfaces the wrong fact at the wrong moment, when a single plan-file the agent reads top-to-bottom would have held everything. Tier 1 wins more often than it has any right to, precisely because you can see the whole memory at once and nothing can retrieve it incorrectly.

Climb only when a tier visibly strains. Tier 2 earns its keep when the work splits across many tasks and one file can no longer hold them. Tier 3 earns its keep when capture-by-hand becomes the bottleneck and the volume is real. Each rung trades a clear benefit for a clear cost — written into the trade-off block on every tier, because the cost is the part most tools won’t tell you.

The honest version

There is no memory that maintains itself for free. Tiers 1 and 2 ask you to write things down; Tier 3 automates the writing but pays for it in drift, mis-retrieval, and a bill that scales with the graph. Pick the lightest tier that survives your actual workload, and move up only when it breaks — not before.