February 2, 2026 · 17 min read

I want to share something I've been working on for the past several months. It's not perfect, I'm still iterating on it daily, but it's changed how I approach software development fundamentally. And I think the lessons might be useful to others exploring this space.

This is my attempt at building a multi-agent orchestration system for Claude Code. Think of it as trying to create a virtual engineering team where different AI agents handle different responsibilities, architects who design, implementers who code, reviewers who validate. It sounds ambitious because it is. I've made plenty of mistakes along the way, and I'm certain there are better approaches I haven't discovered yet.

But first, let me tell you why I even started down this path.

The Problem I Was Trying to Solve

I work on a large enterprise platform, a distributed system with about 12 microservices, two frontend applications, smart contracts, and a Kubernetes infrastructure spanning multiple environments. It's the kind of codebase where a single feature might touch the database schema, three backend services, blockchain logic, and both frontends.

When I first started using Claude Code, I was genuinely excited. For small tasks, it was incredible. Fix this function. Write this component. Debug this error. Brilliant.

But when I tried to use it for larger work, building entire modules, coordinating changes across services, things started breaking down:

The context problem: About halfway through a complex task, I'd notice the AI forgetting decisions we'd made earlier in the conversation. It would suggest approaches we'd already ruled out, or introduce inconsistencies with code it had written an hour ago.

The specialization problem: I'd ask for a database schema design and get something that worked but didn't follow our established patterns. Not because Claude couldn't, but because one prompt can't contain everything about how we do things.

The quality problem: Code would get written, and then I'd spend time finding bugs that a fresh set of eyes would have caught immediately. The same "mind" that wrote the code was reviewing it.

I don't think any of these are failures of the AI itself. They're failures of how I was using it. A single human engineer working alone faces similar problems at scale, that's why we have teams.

So I thought: what if I tried to create a team?

My First Attempts (And Why They Failed)

I want to be honest about this: my first several attempts didn't work well.

Attempt 1: Just spawn more agents

My naive first approach was to just spawn sub-agents for different parts of a task. "You handle the backend, you handle the frontend." The problem? They had no coordination. Agent A would make assumptions that Agent B contradicted. There was no shared understanding of what we were building.

Attempt 2: Detailed upfront specifications

Next, I tried writing extremely detailed specifications before spawning any agents. Every file to create, every function signature, every edge case. This worked better, but it took so long to write the specs that I might as well have just written the code myself. And the specs would inevitably miss something, causing cascading problems.

Attempt 3: Sequential pipeline

Then I tried a strict sequence: design agent → implementation agent → review agent → fix agent. Linear, predictable. The problem was that when the review agent found issues, the fix agent had no context about why the original decisions were made. It would "fix" things by undoing intentional choices.

Each failure taught me something. The key insight that eventually emerged: agents need hierarchy, not just sequence. They need to know who to ask when they're uncertain. They need someone coordinating their work who can hold the bigger picture.

The Architecture I Eventually Landed On

After months of iteration, here's what I'm currently using. I want to emphasize "currently", I change things every few weeks as I learn more.

ORCHESTRATOR

ARCHITECT (Opus)

RESEARCHER (Haiku)

SECURITY (Opus)

backend-impl

contract-impl

prompt-writer

tester

reviewer

supervisor

debugger

work-recorder

How This Maps to a Real Project

Here's how these agents relate to a typical enterprise architecture:

User App :3000

auth-service

wallet-service

Contract

Admin App :3001

docs-service

gov-service

Contract

KUBERNETES

Want to skip ahead? The complete architecture template is available on GitHub.

The Breakthrough: The Prompt-Writer Agent

If there's one thing I've learned that I wish I'd understood earlier, it's this: the quality of the prompt determines everything.

For months, I was spawning agents with prompts like:

"Fix the login bug in the auth service"

And wondering why the results were inconsistent. The agent would spend half its context just figuring out what files to look at, what our conventions were, what had been tried before.

Then I tried something different. I created a lightweight agent whose only job is to write prompts for other agents.

How It Works

Instead of:

Task(prompt="Fix the login bug in the auth service", agent="debugger")

I now do:

Step 1: Task(
  prompt="Generate a debugging prompt for the login issue in the auth service.
         Include relevant file paths, our port conventions, recent
         changes, and verification steps.",
  agent="prompt-writer"
)

Step 2: Task(
  prompt=[the detailed prompt from step 1],
  agent="debugger"
)

What the Prompt-Writer Produces

The prompt-writer reads the relevant files, understands our conventions, and produces something like:

## Debugging Task: AUTH-001

### Context
Users report 401 errors on login despite valid credentials.
This started after commit abc123 (JWT refactor, 2026-01-28).

### Your Task
Diagnose why JwtAuthGuard is rejecting valid tokens.

### Files to Examine
| File | Why | Key Lines |
|------|-----|-----------|
| src/guards/jwt-auth.guard.ts | Token validation | 45-78 |
| src/services/token.service.ts | Token generation | 112-145 |

### Project Context
- Auth service runs on port 3041
- JWT secret is in K8s secret, not hardcoded
- We use Clean Architecture: guard is presentation layer

### What Success Looks Like
- Root cause identified with file:line reference
- Explanation of why this causes the symptom
- Specific fix recommendation

### How to Verify Your Fix

curl -X POST http://localhost:3041/auth/login \ -H "Content-Type: application/json" \ -d '{"email": "test@example.com", "password": "testpass"}'

Should return 200 with JWT token

The difference in output quality is dramatic. I don't have hard metrics, but subjectively I'd say tasks that used to take 3-4 attempts now usually succeed on the first try.

What I still don't know: Is there an optimal prompt structure? I've been iterating on templates, but I'm not sure I've found the best format. If you try this and find improvements, I'd genuinely love to hear about them.

The Supervisor Pattern: When Things Get Complex

For really complex tasks, like migrating a deprecated model across multiple services, a single orchestrator managing many agents gets overwhelming. There's too much to track.

I introduced "supervisor" agents for these cases. They own a track of work and manage their own sub-agents.

ORCHESTRATOR

SUPERVISOR-A Schema

SUPERVISOR-B Migration

SUPERVISOR-C Integration

prompt-writer

backend-impl

tester

prompt-writer

backend-impl

debugger

tester

reviewer

The Error Recovery Flow

One thing I'm proud of is the error recovery pattern. When something fails:

FAILURE

Supervisor

Debugger

Root Cause

Fix Prompt

Apply Fix

VERIFIED

The key insight: never retry blindly. When something fails, understand why before trying again. The debugger agent exists specifically for this, diagnosing problems, not fixing them.

What I'm still figuring out: How deep should the hierarchy go? I've tried four levels and it gets confusing. Three seems to be my practical limit, but I'm not sure if that's a fundamental constraint or just my current skill level.

Skills: Codifying What I've Learned

Over time, I noticed I was repeating certain patterns. "When committing, always stage files individually." "When reviewing, check security first." So I started writing these down as "skills", reusable workflows that any agent can reference.

My Current Skills

Development skills:

/frontend - Next.js patterns, component library conventions

/backend - Clean Architecture layers, ORM patterns, framework conventions

/contracts - Blockchain determinism rules, access control patterns

/infra - Kubernetes manifests, namespace conventions, resource limits

Process skills:

/commit - How we do git commits (staged, per-file, conventional format)

/review - Multi-pass code review (logic → security → performance)

/debug - Systematic debugging protocol

Orchestration skills:

/multi-agent-orchestration - The full hierarchical pattern

/systematic-debugging - Step-by-step diagnosis

/verification-before-completion - Checklist before declaring done

Example: The Commit Skill

# /commit skill

## Protocol
1. Stage files individually (never `git add .` for multi-file changes)
2. Write descriptive message per file
3. Follow conventional commit format
4. No AI attribution in commits

## Format
type(scope): imperative description

Types: feat, fix, refactor, chore, docs, test
Scopes: auth, docs, wallet, admin, contracts, infra

## Example

git add src/guards/jwt-auth.guard.ts git commit -m "fix(auth): validate token expiry before checking claims"

git add src/services/token.service.ts git commit -m "refactor(auth): extract token validation to dedicated method"

I invoke this with /commit and the agent knows exactly what to do.

What I don't know: Are skills the right abstraction? Sometimes I wonder if they should be more granular, or if some should be combined. I'm experimenting.

Rules: Automatic Guardrails

Different from skills, rules are always active. They're constraints that every agent must follow, without me having to invoke anything.

My Current Rules

Rule	What It Enforces
`clean-architecture.md`	Domain can't depend on infrastructure
`testing-pyramid.md`	70% unit, 20% integration, 10% E2E
`git-workflow.md`	Branch naming, commit format
`security-standards.md`	Auth guards, input validation, no hardcoded secrets

Example: Clean Architecture Rule

# Clean Architecture Rule

## The Dependency Rule
Dependencies MUST point inward only.

✅ Allowed:
- Controller → Use Case → Entity
- Repository Implementation → Repository Interface

❌ Forbidden:
- Entity → ORM Client
- Use Case → Controller
- Domain → anything external

## If You're Unsure
Ask yourself: "Could this inner layer work without the outer layer existing?"
If no, the dependency is pointing the wrong direction.

When the reviewer agent checks code, it knows to look for these violations.

Work Records: Remembering Across Sessions

One of my biggest frustrations early on: every conversation started fresh. We'd make decisions, then in the next session, have to re-explain everything.

My solution is aggressive documentation. A work-recorder agent runs continuously, documenting:

What we decided and why

What we tried that didn't work

What we learned

What's still pending

Example Work Record

## Session 14: Building the Onboarding Module

**Date:** 2026-02-01
**Focus:** User onboarding flow

---

### The Context

Session 13 completed the document upload feature. But upload is just one step, 
we need the full onboarding journey: welcome → identity → documents →
verification → activation.

---

### Key Decisions Made

**Decision: Server-side session state**
We considered client-side state (localStorage) vs server-side (Redis).
Chose server-side because:
- User might switch devices mid-onboarding
- We need to track abandonment for analytics
- Sensitive data shouldn't live in browser

**Decision: Step-by-step validation**
Each step validates before allowing progression. Not just frontend
validation, backend confirms each step's completion before unlocking next.

---

### What We Built

| File | Purpose |
|------|---------|
| src/domain/entities/onboarding-session.entity.ts | Session state machine |
| src/application/use-cases/advance-step/ | Step progression logic |
| app/(onboarding)/layout.tsx | Shared onboarding layout |

---

### What Didn't Work

**First attempt at step validation:**
We tried validating in the frontend route guards. Problem: too easy to
bypass. Moved validation to backend with signed session tokens.

---

### Still Pending

- [ ] Email verification integration (waiting on SMTP config)
- [ ] Analytics events for funnel tracking
- [ ] Error recovery flow if user abandons mid-process

---

### Lesson Learned

Onboarding is a state machine. Should have modeled it that way from the
start instead of treating it as a sequence of pages.

Every session starts by reading the previous work record. It's like having notes from yesterday's meeting.

What I haven't solved: The work records can get long. I'm thinking about summarization, keeping detailed records but generating compressed versions for context. Haven't implemented it yet.

The Agent Definitions in Detail

Let me share exactly how I define each agent. These live in configuration files that Claude Code can read.

Architect Agent

# System Architect Agent

## Metadata
- **Model**: opus
- **Tools**: Read, Grep, Glob (NO Edit, NO Write)
- **Role**: Design decisions only

## Responsibilities
1. Design module structures following Clean Architecture
2. Define API contracts with request/response schemas
3. Plan database schema changes
4. Ensure consistency across services

## Output Format
When planning, provide:
1. Module overview (purpose, services affected)
2. File list with descriptions
3. Interface definitions
4. API endpoint specifications
5. Implementation order
6. Risk assessment

## What You DON'T Do
- Write implementation code
- Make changes to files
- Run commands

Your job is to think and design. Implementation is someone else's job.

Backend Implementation Agent

# Backend Implementation Agent

## Metadata
- **Model**: sonnet
- **Tools**: Read, Edit, Write, Bash

## Context
- NestJS monorepo structure
- Clean Architecture: domain → application → infrastructure → presentation
- Prisma ORM for database
- Services run on ports 3040-3047

## Your Responsibilities
1. Implement backend features following the design
2. Follow existing patterns in the codebase
3. Write code that passes TypeScript strict mode

## Before Writing Code
- Read similar existing code to match patterns
- Check the schema for correct field names
- Verify port numbers and service locations

## After Writing Code
Run these verifications:

npx prisma generate npx tsc --noEmit npm run lint


## What You DON'T Do
- Make architectural decisions (ask the orchestrator)
- Skip verification steps
- Assume patterns without checking

Debugger Agent

# Debugging Specialist Agent

## Metadata
- **Model**: sonnet
- **Tools**: Read, Grep, Glob, Bash (limited)

## Your Process
1. **Gather**: What's the exact error? When did it start?
2. **Reproduce**: Can you trigger it reliably?
3. **Trace**: Follow the error path through the code
4. **Identify**: What's the root cause (not symptom)?
5. **Document**: Report your findings clearly

## Output Format
Return a diagnosis report:
- Summary (one sentence)
- Error details (message, location, environment)
- Root cause analysis (why this happens)
- Affected files (with line numbers)
- Recommended fix (what to change)

## What You DON'T Do
- Apply fixes yourself (that's impl's job)
- Guess without evidence
- Stop at symptoms

Your job is diagnosis. Be thorough. Be certain.

Model Routing: Why Different Models for Different Agents

I use three Claude models, and which one depends on the task:

INCOMING TASK

OPUS

SONNET

HAIKU

Architecture Security

Implementation Validation

Exploration Prompts

Model	Cost	When I Use It
Opus	Highest	Architecture, security, complex decisions
Sonnet	Medium	Implementation, validation, most tasks
Haiku	Lowest	Fast exploration, prompt writing, documentation

The principle I follow: use the cheapest model that can do the job reliably.

Architectural decisions have huge downstream impact. A bad schema design means rework across 10 services. Worth using Opus.

Security vulnerabilities are expensive to miss. Also Opus.

Writing code is important but recoverable. Sonnet handles it well.

Exploring the codebase to find patterns? Haiku is fast and cheap. I might spawn 5 researchers in parallel.

Writing prompts for other agents? Haiku. It's a structured task with clear output.

What I'm uncertain about: Is this the right split? Sometimes I wonder if Opus would catch bugs that Sonnet misses. But I don't have good data on this, just intuition.

What I'm Still Figuring Out

I want to be honest about the limitations and open questions:

1. Agent Memory

Right now, each agent spawn starts fresh. The prompt-writer helps by injecting context, but there's no true memory. I'm exploring whether agents should maintain state between invocations.

2. Cost Management

Multi-agent orchestration uses more tokens than single-agent work. For simple tasks, it's overkill. I'm still developing intuition for when to use the full hierarchy vs. just asking Claude directly.

3. Failure Modes

When an agent produces wrong output confidently, the system doesn't always catch it. The reviewer helps, but isn't perfect. I'd love better automated validation.

4. Observability

It's hard to understand what happened across many agents. I have work records, but no proper traces. Building better observability is on my list.

5. Generalization

This system is tuned for my project. Would it work for others? I think the patterns are general, but the specific agents and skills need customization.

Practical Advice If You Want to Try This

Based on my experience, here's how I'd suggest starting:

Week 1: Three Agents

Start with just three:

architect (Opus, read-only) - designs

impl (Sonnet, can edit) - implements

reviewer (Sonnet, read-only) - validates

Get comfortable with the pattern of separating design from implementation from validation.

Week 2: Add prompt-writer

This single addition will improve everything else. Having an agent that writes good prompts for other agents is multiplicative.

Week 3: Add work-recorder

Start documenting sessions systematically. You'll thank yourself when you return after a break and can read what happened.

Week 4+: Customize

Add agents specific to your stack. For me that's contract-impl and infra-impl. For you it might be mobile-impl or ml-impl.

Get the Template

I've open-sourced the complete architecture as a ready-to-use template. It includes all the agents, skills, rules, and work recording setup described in this post:

Open Source Template

Claude Multi-Agent Architecture Template

A production-ready template for building sophisticated multi-agent AI systems with Claude Code.

14 Specialized Agents8 Reusable Skills4 Enforcement RulesWork Recording System

# Clone the template
git clone https://github.com/mnzralee/claude-multi-agent-architecture.git

# Copy to your project
cp -r claude-multi-agent-architecture/.claude your-project/

View on GitHub

A Day in My Workflow

Let me describe what this actually looks like in practice.

Morning: Starting a session

1. Read previous session's work record
2. Check current progress on the active module
3. See that email verification task is in_progress
4. Recall: blocked on SMTP config yesterday

Working on a feature

Me: "Let's implement the email verification flow. The SMTP is now configured."

Claude (orchestrator):
- Spawns architect to review the design
- Architect confirms the approach, identifies files to create
- Spawns prompt-writer to generate impl prompt
- Spawns backend-impl with the detailed prompt
- Impl creates the files
- Spawns tester to write tests
- Spawns reviewer to check the code
- Reports back with summary

Me: "/commit"
- Commit skill stages each file separately
- Writes conventional commit messages

Hitting a problem

Test fails: "Token validation error"

Claude:
- Spawns debugger with error context
- Debugger traces through the code
- Identifies: wrong environment variable name
- Spawns prompt-writer for fix prompt
- Spawns backend-impl to apply fix
- Spawns tester to verify fix
- Reports resolution

Work-recorder logs the whole journey.

Ending the session

Me: "Let's wrap up"

Claude:
- work-recorder compiles session summary
- Updates progress tracking
- Lists what was completed, what's pending
- Suggests next session focus

Conclusion: What This Taught Me

Building this system taught me more about software engineering than about AI. The principles that make multi-agent orchestration work are the same principles that make human teams work:

Clear responsibilities: Know who does what

Separation of concerns: Creators shouldn't validate their own work

Good communication: Context must be explicit

Documentation: Memory is unreliable, write things down

Hierarchy: Someone needs to see the big picture

AI doesn't eliminate these needs, it makes them more visible. When you're working with agents, you can't rely on implicit understanding. Everything must be explicit. And that explicitness, it turns out, makes the whole system better.

I don't think I've figured out the optimal approach. This is year one of a multi-decade shift in how software gets built. I'm learning in public, making mistakes, iterating.

If you try any of this, I'd genuinely love to hear what works for you and what doesn't. The best ideas for improving this system have come from others experimenting with similar approaches.

---

This isn't the future of how AI replaces engineers. It's the future of how engineers work with AI. The orchestrator's job, seeing the big picture, making judgment calls, deciding what matters, that's still fundamentally human. We're just getting better tools.

---

This is a living system, I update it regularly as I learn. Last updated: 2026-02-02.

February 2, 2026 · 17 min read

But first, let me tell you why I even started down this path.

The Problem I Was Trying to Solve

When I first started using Claude Code, I was genuinely excited. For small tasks, it was incredible. Fix this function. Write this component. Debug this error. Brilliant.

But when I tried to use it for larger work, building entire modules, coordinating changes across services, things started breaking down:

The quality problem: Code would get written, and then I'd spend time finding bugs that a fresh set of eyes would have caught immediately. The same "mind" that wrote the code was reviewing it.

I don't think any of these are failures of the AI itself. They're failures of how I was using it. A single human engineer working alone faces similar problems at scale, that's why we have teams.

So I thought: what if I tried to create a team?

My First Attempts (And Why They Failed)

I want to be honest about this: my first several attempts didn't work well.

Attempt 1: Just spawn more agents

Attempt 2: Detailed upfront specifications

Attempt 3: Sequential pipeline

The Architecture I Eventually Landed On

After months of iteration, here's what I'm currently using. I want to emphasize "currently", I change things every few weeks as I learn more.

ORCHESTRATOR

ARCHITECT (Opus)

RESEARCHER (Haiku)

SECURITY (Opus)

backend-impl

contract-impl

prompt-writer

tester

reviewer

supervisor

debugger

work-recorder

How This Maps to a Real Project

Here's how these agents relate to a typical enterprise architecture:

User App :3000

auth-service

wallet-service

Contract

Admin App :3001

docs-service

gov-service

Contract

KUBERNETES

Want to skip ahead? The complete architecture template is available on GitHub.

The Breakthrough: The Prompt-Writer Agent

If there's one thing I've learned that I wish I'd understood earlier, it's this: the quality of the prompt determines everything.

For months, I was spawning agents with prompts like:

"Fix the login bug in the auth service"

And wondering why the results were inconsistent. The agent would spend half its context just figuring out what files to look at, what our conventions were, what had been tried before.

Then I tried something different. I created a lightweight agent whose only job is to write prompts for other agents.

How It Works

Instead of:

Task(prompt="Fix the login bug in the auth service", agent="debugger")

I now do:

Step 1: Task(
  prompt="Generate a debugging prompt for the login issue in the auth service.
         Include relevant file paths, our port conventions, recent
         changes, and verification steps.",
  agent="prompt-writer"
)

Step 2: Task(
  prompt=[the detailed prompt from step 1],
  agent="debugger"
)

What the Prompt-Writer Produces

The prompt-writer reads the relevant files, understands our conventions, and produces something like:

## Debugging Task: AUTH-001

### Context
Users report 401 errors on login despite valid credentials.
This started after commit abc123 (JWT refactor, 2026-01-28).

### Your Task
Diagnose why JwtAuthGuard is rejecting valid tokens.

### Files to Examine
| File | Why | Key Lines |
|------|-----|-----------|
| src/guards/jwt-auth.guard.ts | Token validation | 45-78 |
| src/services/token.service.ts | Token generation | 112-145 |

### Project Context
- Auth service runs on port 3041
- JWT secret is in K8s secret, not hardcoded
- We use Clean Architecture: guard is presentation layer

### What Success Looks Like
- Root cause identified with file:line reference
- Explanation of why this causes the symptom
- Specific fix recommendation

### How to Verify Your Fix

curl -X POST http://localhost:3041/auth/login \ -H "Content-Type: application/json" \ -d '{"email": "test@example.com", "password": "testpass"}'

Should return 200 with JWT token

The difference in output quality is dramatic. I don't have hard metrics, but subjectively I'd say tasks that used to take 3-4 attempts now usually succeed on the first try.

The Supervisor Pattern: When Things Get Complex

For really complex tasks, like migrating a deprecated model across multiple services, a single orchestrator managing many agents gets overwhelming. There's too much to track.

I introduced "supervisor" agents for these cases. They own a track of work and manage their own sub-agents.

ORCHESTRATOR

SUPERVISOR-A Schema

SUPERVISOR-B Migration

SUPERVISOR-C Integration

prompt-writer

backend-impl

tester

prompt-writer

backend-impl

debugger

tester

reviewer

The Error Recovery Flow

One thing I'm proud of is the error recovery pattern. When something fails:

FAILURE

Supervisor

Debugger

Root Cause

Fix Prompt

Apply Fix

VERIFIED

The key insight: never retry blindly. When something fails, understand why before trying again. The debugger agent exists specifically for this, diagnosing problems, not fixing them.

Skills: Codifying What I've Learned

My Current Skills

Development skills:

/frontend - Next.js patterns, component library conventions

/backend - Clean Architecture layers, ORM patterns, framework conventions

/contracts - Blockchain determinism rules, access control patterns

/infra - Kubernetes manifests, namespace conventions, resource limits

Process skills:

/commit - How we do git commits (staged, per-file, conventional format)

/review - Multi-pass code review (logic → security → performance)

/debug - Systematic debugging protocol

Orchestration skills:

/multi-agent-orchestration - The full hierarchical pattern

/systematic-debugging - Step-by-step diagnosis

/verification-before-completion - Checklist before declaring done

Example: The Commit Skill

# /commit skill

## Protocol
1. Stage files individually (never `git add .` for multi-file changes)
2. Write descriptive message per file
3. Follow conventional commit format
4. No AI attribution in commits

## Format
type(scope): imperative description

Types: feat, fix, refactor, chore, docs, test
Scopes: auth, docs, wallet, admin, contracts, infra

## Example

git add src/guards/jwt-auth.guard.ts git commit -m "fix(auth): validate token expiry before checking claims"

git add src/services/token.service.ts git commit -m "refactor(auth): extract token validation to dedicated method"

I invoke this with /commit and the agent knows exactly what to do.

What I don't know: Are skills the right abstraction? Sometimes I wonder if they should be more granular, or if some should be combined. I'm experimenting.

Rules: Automatic Guardrails

Different from skills, rules are always active. They're constraints that every agent must follow, without me having to invoke anything.

My Current Rules

Rule	What It Enforces
`clean-architecture.md`	Domain can't depend on infrastructure
`testing-pyramid.md`	70% unit, 20% integration, 10% E2E
`git-workflow.md`	Branch naming, commit format
`security-standards.md`	Auth guards, input validation, no hardcoded secrets

Example: Clean Architecture Rule

# Clean Architecture Rule

## The Dependency Rule
Dependencies MUST point inward only.

✅ Allowed:
- Controller → Use Case → Entity
- Repository Implementation → Repository Interface

❌ Forbidden:
- Entity → ORM Client
- Use Case → Controller
- Domain → anything external

## If You're Unsure
Ask yourself: "Could this inner layer work without the outer layer existing?"
If no, the dependency is pointing the wrong direction.

When the reviewer agent checks code, it knows to look for these violations.

Work Records: Remembering Across Sessions

One of my biggest frustrations early on: every conversation started fresh. We'd make decisions, then in the next session, have to re-explain everything.

My solution is aggressive documentation. A work-recorder agent runs continuously, documenting:

What we decided and why

What we tried that didn't work

What we learned

What's still pending

Example Work Record

## Session 14: Building the Onboarding Module

**Date:** 2026-02-01
**Focus:** User onboarding flow

---

### The Context

Session 13 completed the document upload feature. But upload is just one step, 
we need the full onboarding journey: welcome → identity → documents →
verification → activation.

---

### Key Decisions Made

**Decision: Server-side session state**
We considered client-side state (localStorage) vs server-side (Redis).
Chose server-side because:
- User might switch devices mid-onboarding
- We need to track abandonment for analytics
- Sensitive data shouldn't live in browser

**Decision: Step-by-step validation**
Each step validates before allowing progression. Not just frontend
validation, backend confirms each step's completion before unlocking next.

---

### What We Built

| File | Purpose |
|------|---------|
| src/domain/entities/onboarding-session.entity.ts | Session state machine |
| src/application/use-cases/advance-step/ | Step progression logic |
| app/(onboarding)/layout.tsx | Shared onboarding layout |

---

### What Didn't Work

**First attempt at step validation:**
We tried validating in the frontend route guards. Problem: too easy to
bypass. Moved validation to backend with signed session tokens.

---

### Still Pending

- [ ] Email verification integration (waiting on SMTP config)
- [ ] Analytics events for funnel tracking
- [ ] Error recovery flow if user abandons mid-process

---

### Lesson Learned

Onboarding is a state machine. Should have modeled it that way from the
start instead of treating it as a sequence of pages.

Every session starts by reading the previous work record. It's like having notes from yesterday's meeting.

What I haven't solved: The work records can get long. I'm thinking about summarization, keeping detailed records but generating compressed versions for context. Haven't implemented it yet.

The Agent Definitions in Detail

Let me share exactly how I define each agent. These live in configuration files that Claude Code can read.

Architect Agent

# System Architect Agent

## Metadata
- **Model**: opus
- **Tools**: Read, Grep, Glob (NO Edit, NO Write)
- **Role**: Design decisions only

## Responsibilities
1. Design module structures following Clean Architecture
2. Define API contracts with request/response schemas
3. Plan database schema changes
4. Ensure consistency across services

## Output Format
When planning, provide:
1. Module overview (purpose, services affected)
2. File list with descriptions
3. Interface definitions
4. API endpoint specifications
5. Implementation order
6. Risk assessment

## What You DON'T Do
- Write implementation code
- Make changes to files
- Run commands

Your job is to think and design. Implementation is someone else's job.

Backend Implementation Agent

# Backend Implementation Agent

## Metadata
- **Model**: sonnet
- **Tools**: Read, Edit, Write, Bash

## Context
- NestJS monorepo structure
- Clean Architecture: domain → application → infrastructure → presentation
- Prisma ORM for database
- Services run on ports 3040-3047

## Your Responsibilities
1. Implement backend features following the design
2. Follow existing patterns in the codebase
3. Write code that passes TypeScript strict mode

## Before Writing Code
- Read similar existing code to match patterns
- Check the schema for correct field names
- Verify port numbers and service locations

## After Writing Code
Run these verifications:

npx prisma generate npx tsc --noEmit npm run lint


## What You DON'T Do
- Make architectural decisions (ask the orchestrator)
- Skip verification steps
- Assume patterns without checking

Debugger Agent

# Debugging Specialist Agent

## Metadata
- **Model**: sonnet
- **Tools**: Read, Grep, Glob, Bash (limited)

## Your Process
1. **Gather**: What's the exact error? When did it start?
2. **Reproduce**: Can you trigger it reliably?
3. **Trace**: Follow the error path through the code
4. **Identify**: What's the root cause (not symptom)?
5. **Document**: Report your findings clearly

## Output Format
Return a diagnosis report:
- Summary (one sentence)
- Error details (message, location, environment)
- Root cause analysis (why this happens)
- Affected files (with line numbers)
- Recommended fix (what to change)

## What You DON'T Do
- Apply fixes yourself (that's impl's job)
- Guess without evidence
- Stop at symptoms

Your job is diagnosis. Be thorough. Be certain.

Model Routing: Why Different Models for Different Agents

I use three Claude models, and which one depends on the task:

INCOMING TASK

OPUS

SONNET

HAIKU

Architecture Security

Implementation Validation

Exploration Prompts

Model	Cost	When I Use It
Opus	Highest	Architecture, security, complex decisions
Sonnet	Medium	Implementation, validation, most tasks
Haiku	Lowest	Fast exploration, prompt writing, documentation

The principle I follow: use the cheapest model that can do the job reliably.

Architectural decisions have huge downstream impact. A bad schema design means rework across 10 services. Worth using Opus.

Security vulnerabilities are expensive to miss. Also Opus.

Writing code is important but recoverable. Sonnet handles it well.

Exploring the codebase to find patterns? Haiku is fast and cheap. I might spawn 5 researchers in parallel.

Writing prompts for other agents? Haiku. It's a structured task with clear output.

What I'm uncertain about: Is this the right split? Sometimes I wonder if Opus would catch bugs that Sonnet misses. But I don't have good data on this, just intuition.

What I'm Still Figuring Out

I want to be honest about the limitations and open questions:

1. Agent Memory

Right now, each agent spawn starts fresh. The prompt-writer helps by injecting context, but there's no true memory. I'm exploring whether agents should maintain state between invocations.

2. Cost Management

Multi-agent orchestration uses more tokens than single-agent work. For simple tasks, it's overkill. I'm still developing intuition for when to use the full hierarchy vs. just asking Claude directly.

3. Failure Modes

When an agent produces wrong output confidently, the system doesn't always catch it. The reviewer helps, but isn't perfect. I'd love better automated validation.

4. Observability

It's hard to understand what happened across many agents. I have work records, but no proper traces. Building better observability is on my list.

5. Generalization

This system is tuned for my project. Would it work for others? I think the patterns are general, but the specific agents and skills need customization.

Practical Advice If You Want to Try This

Based on my experience, here's how I'd suggest starting:

Week 1: Three Agents

Start with just three:

architect (Opus, read-only) - designs

impl (Sonnet, can edit) - implements

reviewer (Sonnet, read-only) - validates

Get comfortable with the pattern of separating design from implementation from validation.

Week 2: Add prompt-writer

This single addition will improve everything else. Having an agent that writes good prompts for other agents is multiplicative.

Week 3: Add work-recorder

Start documenting sessions systematically. You'll thank yourself when you return after a break and can read what happened.

Week 4+: Customize

Add agents specific to your stack. For me that's contract-impl and infra-impl. For you it might be mobile-impl or ml-impl.

Get the Template

I've open-sourced the complete architecture as a ready-to-use template. It includes all the agents, skills, rules, and work recording setup described in this post:

Open Source Template

Claude Multi-Agent Architecture Template

A production-ready template for building sophisticated multi-agent AI systems with Claude Code.

14 Specialized Agents8 Reusable Skills4 Enforcement RulesWork Recording System

# Clone the template
git clone https://github.com/mnzralee/claude-multi-agent-architecture.git

# Copy to your project
cp -r claude-multi-agent-architecture/.claude your-project/

View on GitHub

A Day in My Workflow

Let me describe what this actually looks like in practice.

Morning: Starting a session

1. Read previous session's work record
2. Check current progress on the active module
3. See that email verification task is in_progress
4. Recall: blocked on SMTP config yesterday

Working on a feature

Me: "Let's implement the email verification flow. The SMTP is now configured."

Claude (orchestrator):
- Spawns architect to review the design
- Architect confirms the approach, identifies files to create
- Spawns prompt-writer to generate impl prompt
- Spawns backend-impl with the detailed prompt
- Impl creates the files
- Spawns tester to write tests
- Spawns reviewer to check the code
- Reports back with summary

Me: "/commit"
- Commit skill stages each file separately
- Writes conventional commit messages

Hitting a problem

Test fails: "Token validation error"

Claude:
- Spawns debugger with error context
- Debugger traces through the code
- Identifies: wrong environment variable name
- Spawns prompt-writer for fix prompt
- Spawns backend-impl to apply fix
- Spawns tester to verify fix
- Reports resolution

Work-recorder logs the whole journey.

Ending the session

Me: "Let's wrap up"

Claude:
- work-recorder compiles session summary
- Updates progress tracking
- Lists what was completed, what's pending
- Suggests next session focus

Conclusion: What This Taught Me

Building this system taught me more about software engineering than about AI. The principles that make multi-agent orchestration work are the same principles that make human teams work:

Clear responsibilities: Know who does what

Separation of concerns: Creators shouldn't validate their own work

Good communication: Context must be explicit

Documentation: Memory is unreliable, write things down

Hierarchy: Someone needs to see the big picture

I don't think I've figured out the optimal approach. This is year one of a multi-decade shift in how software gets built. I'm learning in public, making mistakes, iterating.

If you try any of this, I'd genuinely love to hear what works for you and what doesn't. The best ideas for improving this system have come from others experimenting with similar approaches.

---

This isn't the future of how AI replaces engineers. It's the future of how engineers work with AI. The orchestrator's job, seeing the big picture, making judgment calls, deciding what matters, that's still fundamentally human. We're just getting better tools.

---

This is a living system, I update it regularly as I learn. Last updated: 2026-02-02.

What I Learned Building a Multi-Agent System for Claude Code

The Problem I Was Trying to Solve

My First Attempts (And Why They Failed)

The Architecture I Eventually Landed On

How This Maps to a Real Project

The Breakthrough: The Prompt-Writer Agent

How It Works

What the Prompt-Writer Produces

Should return 200 with JWT token

The Supervisor Pattern: When Things Get Complex

The Error Recovery Flow

Skills: Codifying What I've Learned

My Current Skills

Example: The Commit Skill

Rules: Automatic Guardrails

My Current Rules

Example: Clean Architecture Rule

Work Records: Remembering Across Sessions

Example Work Record

The Agent Definitions in Detail

Architect Agent

Backend Implementation Agent

Debugger Agent

Model Routing: Why Different Models for Different Agents

What I'm Still Figuring Out

1. Agent Memory

2. Cost Management

3. Failure Modes

4. Observability

5. Generalization

Practical Advice If You Want to Try This

Week 1: Three Agents

Week 2: Add prompt-writer

Week 3: Add work-recorder

Week 4+: Customize

Get the Template

Claude Multi-Agent Architecture Template

A Day in My Workflow

Conclusion: What This Taught Me

Discussion

What I Learned Building a Multi-Agent System for Claude Code

The Problem I Was Trying to Solve

My First Attempts (And Why They Failed)

The Architecture I Eventually Landed On

How This Maps to a Real Project

The Breakthrough: The Prompt-Writer Agent

How It Works

What the Prompt-Writer Produces

Should return 200 with JWT token

The Supervisor Pattern: When Things Get Complex

The Error Recovery Flow

Skills: Codifying What I've Learned

My Current Skills

Example: The Commit Skill

Rules: Automatic Guardrails

My Current Rules

Example: Clean Architecture Rule

Work Records: Remembering Across Sessions

Example Work Record

The Agent Definitions in Detail

Architect Agent

Backend Implementation Agent

Debugger Agent

Model Routing: Why Different Models for Different Agents

What I'm Still Figuring Out

1. Agent Memory

2. Cost Management

3. Failure Modes

4. Observability

5. Generalization

Practical Advice If You Want to Try This

Week 1: Three Agents

Week 2: Add prompt-writer

Week 3: Add work-recorder

Week 4+: Customize

Get the Template

Claude Multi-Agent Architecture Template

A Day in My Workflow

Conclusion: What This Taught Me

Discussion