I need to give my agent a knowledge base it can actually retrieve from.
Pasting your whole knowledge base into the prompt stops working the moment it outgrows the window — and a vector store you bolt on too early hands back the wrong three paragraphs with total confidence. The job isn't 'store the documents,' it's 'find the right passage and only the right passage, on demand.' Three ways to make a corpus retrievable, simplest first.
- 3
- ways to retrieve
- ~15 min
- to a working first version
- 0
- vector databases needed to start
Ch. 01 What it is
Pasting your whole knowledge base into the prompt stops working the moment it outgrows the window — and a vector store you bolt on too early hands back the wrong three paragraphs with total confidence. The job isn't 'store the documents,' it's 'find the right passage and only the right passage, on demand.' Three ways to make a corpus retrievable, simplest first.
Ch. 02 The three ways to build it
Simplest path first. Every tier carries its real setup time and its honest trade-off — the cost is the part most write-ups leave out.
Tier 1 · simplest path
Markdown corpus + plain search
Keep the knowledge as flat markdown files, one topic per file, with a short index file that lists what exists and where. When the agent needs a fact, it searches the corpus by keyword — ripgrep, a built-in file search, whatever's at hand — reads the handful of files that match, and answers from them. No embeddings, no database, no chunking. The whole knowledge base is text you can open and read yourself, which means when the agent retrieves the wrong file you can see exactly why and fix it by renaming a heading. On a corpus of dozens to low hundreds of files, this beats a vector store outright: keyword search is exact, it never 'almost' matches, and there's nothing to go stale or drift.
Tier 2
Chunk + embed + vector store
Now retrieve by meaning. Split each document into passages a few paragraphs long — chunks small enough to be one idea, large enough to stand alone — and run each through an embedding model that turns it into a vector. Store the vectors. At query time, embed the question the same way and pull back the handful of chunks closest to it in meaning, then hand only those to the agent. Now 'churn' finds the passage about cancellations, because they sit near each other in meaning-space even though they share no words. You're no longer feeding the model the whole corpus and hoping the answer's in there — you're feeding it the three passages that actually bear on the question, which is cheaper, faster, and far less prone to the model latching onto an irrelevant aside.
Tier 3
Hybrid retrieval + re-rank + promotion/expiry
Run both retrievers and let each cover the other's blind spot. A keyword pass (BM25) catches the exact strings — names, codes, versions — that embeddings miss; a dense pass catches the meaning that keywords miss; you fuse the two result lists so a passage that scores on either route surfaces. Then add the step that does the most for accuracy per dollar: a re-ranker. The first pass casts a wide, cheap net — pull twenty candidates; the re-ranker reads the question against each candidate properly and reorders, so the four passages the agent actually sees are the four most relevant, not merely the four nearest. Finally, give the corpus a clock: tag entries with a source and a freshness date, promote a passage that keeps getting retrieved and confirmed into a trusted tier, and expire or down-rank what's gone stale — so retrieval prefers what's current and proven over what's merely present.
Ch. 03 The detail
Pasting your whole knowledge base into the prompt stops working the moment it outgrows the window — and a vector store you bolt on too early hands back the wrong three paragraphs with total confidence. The job isn't 'store the documents,' it's 'find the right passage and only the right passage, on demand.' Three ways to make a corpus retrievable, simplest first.
- Category
- Knowledge-ops · Retrieval & RAG
- Format
- System
- Level
- advanced
- Provenance
- Upgraded third-party
The problem, stated plainly
The first instinct, once your notes outgrow what fits in a prompt, is to paste more — a bigger context window, the whole handbook dumped in at the top of every conversation. It works until it doesn’t, and when it stops it stops badly: the model drowns in the dump, fixates on a paragraph that has nothing to do with the question, and you’re paying to send the same fifty pages on every turn. A knowledge base isn’t a pile you hand over whole. It’s a corpus you can reach into — pull the one passage that answers the question, leave the rest on the shelf.
That’s the distinction this entry turns on, and it’s worth being precise about, because it’s a different job from giving an agent memory. Memory is about continuity — what happened, what you decided, what to remember between sessions; it’s written as you go and read back so the agent doesn’t start every session a stranger. A knowledge base is about retrieval — a body of reference material that already exists, made queryable so the agent can fetch the relevant slice on demand. You can want both, and they overlap, but the question here is narrow: when the agent needs a fact from a corpus you own, how does it find the right one and only the right one?
Retrieval is a precision problem, not a storage problem
Storing documents is trivial — a folder does it. The hard part is that retrieval can fail in two opposite directions, and the tiers below are organized around which failure you’re fighting. It can miss: the answer is in the corpus, but the search didn’t surface it, so the agent answers from nothing or guesses. Or it can mis-hit: it surfaces a passage that’s near the question but doesn’t actually answer it, and the model, being agreeable, builds a confident answer on the wrong foundation. Keyword search misses on meaning. Semantic search mis-hits on exact strings. The advanced tier exists because the two failure modes have opposite cures, and the only way to fix both is to run both and reconcile.
So climb deliberately. Tier 1 — flat markdown and keyword search — is right far longer than the RAG tutorials suggest, because on a corpus you can still read by hand, exact matching never “almost” works and there’s nothing to maintain. Tier 2 earns its half-day when the same words keep missing the right passage and you need retrieval by meaning. Tier 3 earns its day or two only when Tier 2 visibly returns the wrong passage on questions where wrong is expensive — and that’s when the hybrid pass, the re-ranker, and the freshness clock start paying for their own complexity.
The honest version
There is no retrieval that fetches the right passage every time. Tier 1 is exact but literal — it finds your words, not your meaning. Tier 2 finds meaning but fumbles the literal — names, codes, the strings where the exact characters are the answer. Tier 3 covers both blind spots and re-orders so the best passages reach the agent first, then keeps the corpus honest with a promotion-and-expiry clock — but it pays for that coverage in moving parts, model calls, and the standing work of keeping a freshness signal that means something.
Pick the lightest rung that returns the right passage on the questions you actually ask. A knowledge base the agent can’t retrieve from cleanly is worse than no knowledge base at all — it answers with the same confidence whether it found the truth or the paragraph next to it, and that second case is the one that costs you. Build up the ladder only as the failures you can measure force you to.
What it takes to stand each version up, from the lightest path on.
- 1
Markdown corpus + plain search
Setup ~15 min
- plain markdown
- ripgrep or full-text search
- 2
Chunk + embed + vector store
Setup ~half day
- an embedding model
- a vector store (LanceDB, Chroma, pgvector)
- 3
Hybrid retrieval + re-rank + promotion/expiry
Setup ~1–2 days to wire
- dense (embedding) + keyword (BM25) retrieval
- a re-ranker
- a freshness/promotion policy
The honest version. Each tier buys you something and costs you something — both are stated plainly, never buried.
-
Tier 1 · Markdown corpus + plain search
Keyword search only finds the words you typed. Ask for 'churn' when the doc says 'cancellations' and you get nothing — there's no sense of meaning, only of spelling. It also returns whole files, so a long document means the agent reads a lot to find one line. This is the right tier far longer than people expect, but it has a real ceiling: once the corpus is large enough that the matching file is too big to read whole, or the language varies enough that exact words keep missing, keyword search starts handing back silence where there's actually an answer.
-
Tier 2 · Chunk + embed + vector store
Embeddings retrieve by vibe, and vibe misses on the things keyword search nails: exact names, error codes, SKUs, version numbers, anything where the literal string is the answer. Ask for a specific product code and a semantic search will happily return three chunks that are *about* products and none that contain the code. Chunking is its own trap — split too small and a passage loses the context that made it meaningful; too large and you're back to reading noise. And the store is now a thing you maintain: re-embed when documents change, or it quietly answers from last month's version.
-
Tier 3 · Hybrid retrieval + re-rank + promotion/expiry
Every part of this earns its keep only at scale, and each part is a thing that can rot. Two retrievers, a fusion step, a re-ranker, and a freshness policy is four moving parts to debug when the agent returns something wrong — and it will, because no retrieval is perfect; the failure just gets rarer and harder to spot. The re-ranker adds a model call to every query, so latency and cost rise. Promotion and expiry need an honest signal of what 'confirmed' means, or you automate the wrong passage into the trusted tier and now it's worse than a flat store. Don't build this until Tier 2 is visibly returning the wrong passage on questions that matter — most knowledge bases never need the clock.
Edition June 2026 · Updated June 20, 2026