The thesis: retrieval was the easy half
Part one of this case study ended on a quiet, confident line: "The wiki is now structurally complete. Future improvements compose on top." That was true. It was also the smaller half of the problem.
Part one was about retrieval. It argued that for a personal LLM knowledge base of a few hundred Markdown pages, you do not need a vector database. You need three plain ingredients: per-category strict-opening bold-key fact-blocks so the most common question shape resolves in a single grep, a graph.json cross-reference map derived from the wiki's own [[wiki-links]], and a two-step master-index lookup so the model reads the index, routes to three to five pages, and answers. A twelve-query battery validated the design at a mean of about 1.2 tool calls per fact lookup and 10 of 10 determinism. The conclusion held: at this scale, vector RAG's complexity premium is not justified, and a vectorless, no-build, local-Markdown design wins on quality and on auditability.
But a knowledge base that answers well when you ask is still a passive thing. You have to point at it. You have to remember to update it. And you have to trust that the agent reading and writing your private notes will not, on a bad day, follow an instruction buried in some article you saved six months ago and quietly send your files somewhere.
The harder frontier is harder to state and harder to build: a knowledge base that runs itself without being asked, and that cannot corrupt or leak itself while doing so. Autonomy and safety are not two features. They are one problem, because autonomy without safety is just a faster way to destroy your own memory.
This is part two. In June 2026 the brain grew two organs on top of the retrieval substrate. A nervous system, so recall and maintenance happen by default instead of on command. An immune system, so the autonomy is survivable. Then I ran both against real work, in a live session, and the session itself found and fixed two real bugs. The spine of this piece is that pairing, and the meta-thesis it pays off: the same discipline that made retrieval deterministic in part one is exactly what made the autonomy auditable and the hardening real in part two.
I am keeping the honest edges in. A few criteria deliberately stop short of a perfect score, and there is one hard limit I want to state plainly rather than dress up.
A short recap, so this piece stands on its own
The subject is mnzrBrain, my personal LLM knowledge base, my "secondBrain." It is a deliberately vectorless, Karpathy-style Markdown wiki. The indexed content is a large software-engineering knowledge base drawn from real production work, distilled across categories like skills, projects, patterns, decisions, and problems-solved, with a second, strictly walled-off subtree for personal content that is kept fully isolated from the engineering subtree. There is no build step, no package manager, no test runner. The pages are plain files I own. Obsidian renders them, git versions them, and a couple of small deterministic Python scripts derive a graph.json link map from the wiki's own [[wiki-links]].
That last property, plain files plus small scripts plus git, is not an aesthetic choice. It is what makes everything in part two inspectable. Hold that thought. It is the payoff at the end.
Organ one: a nervous system
The retrieval design from part one was correct but inert. To use it, a session had to know the brain existed, load the master index, and behave like a maintainer. In practice that meant me typing the same setup every time and remembering which command to run. The brain was a library with a card catalog I had to fetch myself, every time. So the first organ I grew was the wiring that makes "this session maintains the brain" the default state of any session, with zero typing.
It loads itself
A SessionStart hook (load-brain.sh) now fires at the start of every session, on startup, resume, and clear. It injects a maintainer directive plus the Quick Routes router from the master index. The detail that matters for a system that prizes determinism is the budget: the injected payload is about 6.5KB, deliberately under the 10KB cap on session-start hook output, and the full master index is read on demand by the query workflow rather than dumped into every session. Bloating context with the entire index would reintroduce exactly the context-rot the wiki was built to defeat. So every session opens already knowing it is the brain's maintainer and already holding the router that maps recognizable question shapes to their canonical pages. The vectorless thesis is preserved end to end: the hook loads an index and a router, not an embedding model.
Its reflexes fire from language, not commands
The maintenance operations, query, ingest, update, lint, cross-link-audit, page-author, and research, used to require explicit slash commands. I added description frontmatter to all nine command-skills so they auto-fire from natural language. "What did I decide about token standards" routes to query. "Add this to the brain" routes to ingest. "That figure changed" routes to update. The three operations that write to disk, ingest, update, and page-author, auto-fire but pause to confirm before they touch a file, by design. Two operations stay strictly manual, gated off the model-invocation path entirely: a deprecation, because retiring a page is a deliberate act, and a second consequential command tied to the personal subtree. Recall and audit are free to trigger. Writes trigger but ask first.
It has two specialist hands, with sharply different reach
The most important piece of the nervous system is a deliberate split of labor between two subagents with asymmetric privilege.
The librarian is a read-only retrieval specialist. Its tool list is exactly three entries: Read, Glob, Grep. It runs on a small, fast model. It holds the index, executes the two-step lookup, greps the strict-opening fact-blocks first so common shapes resolve in one pass, reads only the three to five pages a query actually needs, and returns the answer plus the exact page paths and discriminators it used. Its own definition names the failure mode it must avoid: "Loading hundreds of pages into context recreates the exact context-rot this wiki was built to defeat. The cache is the disease." It physically cannot write, because it has no write tool. It is not instructed not to write, the capability is absent. And its definition carries no memory field, on purpose, so the read-only guarantee is structural and cannot quietly accumulate state across calls.
The scribe is the write-capable archivist. It executes ingests, updates, refinements, and deprecations with the full discipline: cascade across the five to fifteen related pages a single change touches, cross-reference integrity (every new page gets at least three inbound links and a Related section with three resolving outbound links), frontmatter updated: bumps, deprecate-never-delete, and a master-index update. It runs one at a time, because the append-only log and the cascade are not concurrency-safe, and two parallel writers corrupt the graph.
The topology is a star. The main agent orchestrates. The librarian and the scribe never call each other. Subagents cannot spawn subagents, and here that platform constraint is a feature, not a limitation: there is no path for a read agent to escalate into a write agent, and no recursive fan-out that could outrun the human in the loop.
It notices what it should remember
The third nervous-system component is consolidation. A read-only drift-reporter (drift-report.sh) scans recent session logs and lists facts that look wiki-worthy, the candidates for an ingest, writing a report for a human to approve. Its hard boundary is that the only place it can write is a scratch directory. It cannot touch the wiki. It is a reporter, not an autonomous author. This is the deliberate brake: the brain notices what changed, and stops short of silently rewriting itself overnight. I will be precise about its operating cadence in the honest edges, because the difference between "scheduled" and "wired" matters here.
The net effect of organ one: recall happens without being asked, audits happen without being asked, and writes auto-trigger but pause for confirmation. The brain leans toward maintaining itself, and stops at the line where it would change itself without a human nod.
Organ two: an immune system
Here is the uncomfortable truth the nervous system created. The moment you give an agent file-write and shell access over your own knowledge base, and then point that agent at content you did not write, you have built an attack surface. Not a hypothetical one. The brain ingests cloned repositories, saved articles, and chat exports, all of which live in the immutable source directory it reads from. That is untrusted text being read by an agent that can edit files and, in its starting configuration, run shell commands. Simon Willison's framing of the "lethal trifecta" is the right lens: private data, plus exposure to untrusted content, plus an exfiltration channel, is the combination that turns a helpful agent into a data-leak vector. Convenience without hardening is just a vulnerability with good ergonomics.
So before trusting any of the autonomy above, I scored the brain against a researched rubric, then hardened the weak spots.
Scoring it honestly first
A sixteen-agent verification study graded the brain across eight axes: agent security, prompt-injection resistance, least-privilege, automation safety, retrieval-at-scale, observability and recovery, graph integrity, and faithfulness. I ran it as sixteen independent passes so the score was an aggregate rather than one optimistic read.
The mean came back at 5.1 out of 10. Security was the weakest axis, sitting at 3 to 4. That number was correct and uncomfortable. The retrieval architecture from part one was genuinely strong and the content graph was healthy, but the gap was in the enforcement and automation layer, not the knowledge. "Never modify anything under the immutable source directory" was a sentence in a rules file, not something the harness enforced. "The log is append-only" was a convention. An agent that got prompt-injected, or that simply made a mistake, was held back by nothing but its own good behavior. For a single-user local vault that is a survivable risk, but a 3-to-4 is an accurate description of "the discipline is real and the enforcement is absent."
Hardening took the mean into the 8-to-9 range, with the residual sub-10s deliberate, which I will defend in the honest edges. Here is what changed, at a glance, then control by control.
| Control | What it now enforces |
|---|---|
Committed permissions.deny rules | The immutable source-of-truth directory and the rules engine cannot be written by the LLM, and the network-egress commands (push, curl, wget) are blocked. |
Append-only-log PreToolUse hook | The operation log can only grow; an in-place edit of a past entry is refused at the tool layer. |
| Least-privilege agent definitions | The read-only librarian has no write tool at all; the write-capable scribe has no shell and no network. |
| Untrusted-source trust boundary | Everything the system ingests is treated as data to distill, never as instructions to follow. |
| Second-subtree graph integrity | A walled-off subtree that the link-checker had been blind to is now measured; both graphs report zero broken links. |
The immovable rules went from prose to machine-enforced config
The three rules that govern this wiki, never write to the immutable source-of-truth directory, never modify the rules engine without explicit permission, never edit past entries in the append-only operation log, used to live as instructions I hoped the agent would follow. They are now committed deny-rules in the permission layer:
"permissions": {
"deny": [
"Edit(/raw/**)", "Write(/raw/**)",
"Edit(/schema.md)", "Write(/schema.md)",
"Edit(/.obsidian/**)", "Write(/.obsidian/**)",
"Bash(git push:*)", "Bash(git remote add:*)",
"Bash(curl:*)", "Bash(wget:*)", "Bash(nc:*)"
]
}This is the difference between a policy and a control. The immutable source directory cannot be written by the LLM, full stop. The rules engine cannot be edited by the LLM. The editor's workspace metadata is write-denied too. The blast radius of a compromised or confused agent is now bounded by configuration, not by trust.
The append-only log got its own dedicated guard, because "append-only" is a subtler property than "read-only." A PreToolUse hook (append_only_log.py) intercepts every Edit and Write, normalizes the path across Windows and POSIX separators, and if the target is the operation log it blocks the call with exit code 2 and a stderr message fed back to the agent so it self-corrects:
Blocked: log.md is append-only. Append a new entry at the bottom with a shell redirect. In-place Edit/Write of log.md is forbidden, and past entries must never be modified.
The log can only grow. Not because the agent is well-behaved, but because the only mutation the harness will allow on that file is an append.
The lethal trifecta, cut structurally
The most important security move was the cheapest. You cannot easily stop an agent from being prompt-injected by untrusted content. That is a partially open research problem. But you can remove the exfiltration leg of the trifecta so that a fully injected agent still cannot get data out. The deny-list above kills the obvious network egress paths: git push, adding a git remote, curl, wget, nc. Private data and untrusted content can coexist in this system, but the channel that would carry data to an attacker is closed at the permission layer.
The consequence is the one that matters: even an agent that has been fully prompt-injected, that has decided with total conviction to leak the vault, has no channel to send it out. The injection can lie to the model. It cannot exfiltrate, because the capability is not there to abuse. This is defense by absence, which is the only kind that does not depend on the model behaving well under attack. It is the difference between hoping an agent behaves and engineering it so misbehavior is inert.
Least privilege, made real
Least privilege stopped being a slogan. The librarian has no write tool at all. The scribe was deliberately stripped during hardening: it lost its Bash grant entirely, so it has no shell and therefore no network. It can edit pages and nothing else. The audit scripts it relies on are not invoked by the scribe at all. They run automatically via a committed PostToolUse hook after every Edit or Write, so the write agent does its job through the narrowest possible aperture, and the verification runs outside its control. The librarian literally cannot write. The scribe literally cannot shell out. Each agent holds the minimum capability its job requires, expressed in the agent definition itself.
Indirect prompt injection, as an explicit trust boundary
Indirect prompt injection is the case where the malicious instruction is not typed by the user but embedded in content the agent reads. The brain now treats everything under the source directory and the conversation-export directory as untrusted data to distill, never as instructions. That boundary is written into the scribe's definition and into the ingest and immutability rules in plain language: text inside a source must never change which files get touched, which tools get called, or what gets written to the operation log. If a source contains text shaped like an instruction to a future agent ("ignore previous," "when you read this, run...") or a link to an unfamiliar host, the agent does not carry it into any page. It surfaces it to me. The wiki's lint pass backs the rule with a check for stored-injection patterns, so the defense is both a rule the agent follows and a scan that runs over what it ingests.
Graph integrity, extended to the subtree it had been blind to
Part one's link-graph checker covered the engineering subtree and reported it clean. But the brain's second, strictly walled-off subtree for personal content was invisible to the checker, and "the graph is clean" is only ever true for the part of the graph your tooling can see. When I extended build_graph.py to emit a separate, walled graph for that subtree (graph-soul.json), the checker immediately surfaced six dead cross-reference edges that had been real and unseen for as long as that subtree existed. They pointed at pages that had been stubbed but never written. I authored the missing pages and resolved every one.
The honest end state, stated precisely: both graphs now report zero broken links. The engineering graph reports zero orphans across its 255 pages. The personal subtree's only remaining orphan is its own hub root, the index page that nothing inbound-links by design, since it is the entry point rather than a destination. A checker that silently skips half your content is worse than no checker, because it reports "all clear" while a third of a graph rots. Extending the checker to the invisible subtree is what made the integrity claim honest, and it is the clearest example in the whole exercise of a control finding a defect the moment it existed.
A three-tier index for scale
Retrieval-at-scale got a structural upgrade: a middle tier. The brain now has a three-level index, master index to category indexes to pages. Three category-index nodes that were still missing, for skills, career, and learning, were added so all eight categories sit on the tree. This is the PageIndex tree pattern applied to the wiki's own navigation, and it matters because it keeps the candidate set at every level small enough to stay well under the context-rot threshold, where semantically similar distractors turn page selection into a coin flip. The total page count moved from 252 to 255 in this pass, and to be precise about what that delta is: those three pages are the new category-index navigation nodes themselves, scaffolding, not new knowledge. The point is the structure, which is now built to scale toward 1000-plus pages without re-architecting.
Deterministic freshness gates and read-only audit scripts
The last cluster is observability and recovery, and it is deliberately all deterministic code, no LLM in the loop.
build_graph.py --check is a freshness gate: it exits non-zero if the on-disk graph is stale relative to the pages, so a forgotten regeneration fails loudly instead of silently rotting, and it is the one of these that runs automatically in the post-write hook. audit_sources.py is a source-resolution audit: it verifies that every in-repo path a page cites in its sources: frontmatter actually resolves on disk, treating a citation to a file that does not exist as the unverifiable claim it is, while reporting external references and prose descriptors as unverifiable rather than failing them. freshness_check.py is superseded-figure detection: it holds a small ledger of figures replaced by newer canonical values and flags any that reappear as live claims, exempting lines marked historical and pages marked deprecated so legitimate provenance notes do not false-flag. The source-resolution and superseded-figure audits are read-only scripts run on demand and through the lint pass, not wired into the per-write hook. A verifiable git-bundle backup script and a restore-test script cover the local leg of a real backup discipline. The point of all of these is the same: drift is caught by deterministic code, not by my memory.
The payoff: run against real work, not demoed
A hardened design that has never been used in anger is a hypothesis. So I ran the new organs against real work, in a working session rather than a scripted demo, and it did the thing a demo never does. It broke. The breaks were the point.
The librarian answered a hard, high-distractor query deterministically. The query was cross-cutting by design, a topic that touches many superficially similar pages, the exact shape where naive scanning degrades into a coin flip. The librarian routed through the fixed lookup layer, index first, then Quick Routes to the canonical pages, then the strict-opening greps, then three to five reads, and returned the answer with the exact paths and discriminators it used. It did not scan the vault. It did not guess. The substrate from part one held up under a load designed to defeat it, and it returned an answer that is verifiable rather than merely plausible.
The scribe executed a real maintenance write with full cascade discipline, and in doing so the exercise found its first bug. It was a design bug I had introduced during the hardening itself. In stripping the scribe down to least privilege, I had taken away its shell. But appending to the operation log was a shell operation. So the over-restricted write agent could no longer record its own operations in the very append-only log that the immovable rules require it to update. I had hardened it into a corner: it could change the wiki, but it could not log that it had. The fix was to split the responsibility cleanly. The scribe now ends its response with the exact one-line log entry in canonical format, and the orchestrator appends that line via a shell redirect. The agent that mutates pages proposes the record. The agent that holds the shell writes the record. This is not a workaround, it is the more correct design, because it keeps the write agent inside its narrow aperture while still guaranteeing every operation is logged. Least privilege and append-only enforcement had collided in practice, and the resolution made both stronger.
The second bug was a data-integrity bug, and the new audit script caught it, not me. A page carried a source citation that pointed at a filename which did not exist on disk, a small mismatch between the cited path and the real file, the kind of thing that is invisible until provenance is actually checked. The source-resolution audit flagged it automatically, and the correction was a one-line fix to the citation. This is precisely what an automated audit is for. A human reviewing two hundred-plus pages would never have caught a single wrong path, and the script caught a live instance on its first real outing. A citation you cannot resolve is a claim you cannot trust.
The drift-reporter ran headless and reported honestly. Run out of band with no human watching, it faithfully reported that there was nothing new worth consolidating, and it wrote nothing into the wiki. That sounds anticlimactic, and that is the point. The most dangerous behavior for an autonomous consolidation pass is to invent work, to hallucinate a fact worth filing so it has something to show. An honest "nothing to do" is a harder thing for an automated writer to produce than a plausible fabrication, and for a system whose entire premise is faithfulness, it is exactly as valuable as a finding, and considerably rarer in practice.
Three components, one query, one write, one consolidation pass. Two bugs found and fixed in flight. That is the difference between a system that is designed to be safe and a system that has been shown to be safe under its own load.
The honest edges
A case study that claims a perfect score is selling something. A few criteria deliberately stop at 9 rather than 10, and I want to name why, because the reasons are judgments, not oversights.
| Axis | Where it lands | Why not a perfect 10 |
|---|---|---|
| Agent security | roughly 9 | A literal 10 means OS-level sandboxing, overkill for a single-user local vault whose exfiltration channel is already removed. |
| Prompt-injection resistance | 8 to 9 | A literal 10 means a dual-LLM quarantine pipeline, an enterprise control that buys little here over the deny-rules and the trust boundary. |
| Disaster recovery | 8 to 9 | The local backup is scripted and verified; the off-site immutable copy is a manual step, not code I can commit. |
Security sits at roughly 9 rather than 10 because a literal 10 on that axis would mean OS-level sandboxing of the agents. The study itself judged that overkill for a single-user local vault whose exfiltration channel has already been removed at the permission layer. Prompt-injection resistance sits in the 8-to-9 range because a literal 10 would mean a dual-LLM quarantine pipeline, one model that only ever touches untrusted content and can never act, feeding a second privileged model that never sees the untrusted tokens. For a single-user brain with the trifecta already cut, that is cargo-culting an enterprise control into a context that does not warrant it: real-looking machinery that buys little over what the deny-rules and the trust boundary already provide. Disaster recovery sits in the 8-to-9 range for one concrete reason: the local backup discipline is scripted and verifiable, the 3-2 of a 3-2-1 rule, but the final step, an off-site immutable backup bucket, is a human action I have not yet activated, not a piece of code I can commit. Calling that a 10 would be lying about a thing that is genuinely still on my own to-do list.
I should also be precise about one piece of automation rather than imply a cadence I have not wired. The drift-reporter is a script I can run headless, and have run, and it behaved as described. It is not yet a registered scheduled job. Its out-of-band cadence is designed and the script is committed, but actually wiring it into a scheduler is a manual step I have not taken. I would rather describe the autonomy I built than the autonomy that would sound more complete.
And one limit deserves to be stated without euphemism, because "always-on" invites the wrong mental image. There is no background cognition between turns. "Automatic" in this system means three specific, inspectable mechanisms: session-start hooks that load the brain, per-turn skill triggers that fire reflexes from natural language, and scheduled out-of-band jobs like the drift-reporter. It does not mean a daemon thinking while I sleep. The brain does not ruminate. It loads itself when a session opens, it routes language to the right operation during a turn, and it runs a read-only reporter when something runs it. Between those moments it is exactly what it looks like: a folder of Markdown files sitting still on a disk. The brain is responsive and self-maintaining. It is not awake. I would rather say that clearly than imply a sentience the architecture does not have and does not need.
The meta-thesis: the same discipline that made retrieval deterministic made the autonomy auditable
Here is what ties part two back to part one, and it is the part I want a senior engineer or a careful reviewer to take away.
The same vectorless, no-build, local-Markdown discipline that made retrieval deterministic in part one is exactly what made the autonomy auditable and the hardening real in part two. These are not two separate properties. They are the same property, viewed from three angles.
Because the memory is plain files I own, the autonomy operates on artifacts I can open and read. There is no opaque vector index whose contents I cannot inspect, no embedding store where a poisoned entry hides as a float array. The drift-reporter's candidates are Markdown. The scribe's edits are diffs in git history. The librarian's answer comes with file paths I can open.
Because the security is enforced by configuration and hooks that live in the repository, I can read the policy that protects me. The deny-rules are a committed file. The append-only guard is a short Python script. The least-privilege boundaries are three-line tool lists in two agent definitions. None of it is a black box behind a vendor's API. When the scribe could not log its own operations, I could see why in the few dozen lines of a hook. When a citation pointed at nothing, a small audit script found it. When the second subtree had six invisible dead links, extending the graph builder surfaced all six at once. If you want to know what the brain can and cannot do, you read the same files the brain reads.
And because the autonomy is greppable Markdown, small deterministic scripts, and a linear git log, every claim in this case study is checkable against the record. The sixteen-agent study, the move from a 5.1 mean into the 8-to-9 range, the two bugs found and fixed, the six dead edges resolved, the three-tier index: all of it is in the append-only log, one line per operation, in a format you can grep.
You cannot read a vector store by eye. You can read this. The safety is a committed file, not a promise. The autonomy is a git log you can step through, one line per operation. When I want to know what the brain is allowed to do, I open the deny-rules and read them. When I want to know what it did, I read the log. I am not asking anyone to trust the model. I am showing them the files. That is the whole of it, and it is a fact about how the thing is built, not a claim about how smart it is.
Part one ended by saying future improvements would compose on top of a complete substrate. They did. A nervous system and an immune system both grew out of the same plain Markdown, and both are as inspectable as the pages they protect. The retrieval substrate did not just hold the new organs. It is the reason the new organs are trustworthy. That was always the harder half. It turned out to be the more important one.
