Why can't one Claude session review its own plan?

The same reasoning that produced the plan also produced its blind spots. A single session is anchored to its own logic — it will confirm its approach rather than challenge it. A second session reads the plan cold with no shared context, so it evaluates what's written rather than what was intended.

How do I set up a review mode for Claude Code?

Create a system prompt file (e.g., ~/.claude/modes/review.md) that instructs Claude to read artifacts adversarially, group findings by severity, suggest specific fixes, and verify assumptions by researching actual tool behavior. Load it with `claude --system-prompt "$(cat ~/.claude/modes/review.md)"`.

How long does the review loop add to planning?

Typically 60-90 seconds for the review pass, plus time to address findings. For plans that would have shipped with critical gaps, the review saves hours or days of rework. For clean plans, it's a minute of confirmation.

Does this work for code review too, not just plans?

Yes. The same pattern applies to architecture docs, config files, migration scripts, and any artifact where the author's assumptions need independent verification. The key is that the review session reads the artifact without the context of how it was created.

Write the Plan with One Agent, Break It with Another — The Review Loop That Catches Real Bugs

I wrote a security hardening plan for Claude Code. Deny rules for SSH keys, .env files, network tools. It covered the threat model, laid out phases, had a rollback strategy. Looked solid.

Then I opened a second Claude session — one configured for adversarial review instead of collaborative planning — and pointed it at the same plan file. In under 60 seconds it found three critical gaps:

What the plan said	What the review found
`Read(~/.ssh/**)` protects SSH keys	Only blocks the Read tool — `cat ~/.ssh/id_ed25519` via Bash is unblocked
`Bash(curl *)` blocks exfiltration	`python3 -c "import urllib.request; ..."` bypasses it trivially
Sandbox is a “future consideration”	Sandbox is the only layer that actually enforces at the OS level

Three critical issues in a plan that passed my own review and the authoring session’s review. The second session found them because it was told to look for gaps, not confirm the approach.

Why Self-Review Fails

One Claude session can’t effectively review its own work. It has the same blind spots that created the gaps. It’s anchored to its own reasoning — when it reviews the plan, it’s really reviewing its own thought process and finding it sound. That’s not review. That’s confirmation.

A second session reads the plan file cold. No shared context. No memory of the reasoning that led to each decision. It sees what’s written, not what was intended. And it’s loaded with a different system prompt that tells it to be adversarial rather than collaborative.

The file on disk is the only thing they share. That’s the feature.

The Loop

Session 1 (planning mode): Write the plan
  → artifact: plans/thing.md

Session 2 (review mode): Review the plan adversarially
  → artifact: list of findings by severity

Session 1 (planning mode): Address findings, update plan
  → artifact: plans/thing.md (v2)

[If critical issues found] Session 2: Re-review

The plan file is the handoff contract. Each session reads it from scratch. No context leaks between them.

In practice, I run this with shell aliases that load different system prompts:

alias claude-cowork='claude --system-prompt "$(cat ~/.claude/modes/cowork.md)"'
alias claude-review='claude --system-prompt "$(cat ~/.claude/modes/review.md)"'

Planning session writes the plan. Review session tears it apart. Planning session addresses the findings. Two terminals, one plan file, zero shared state.

What Makes Review Mode Work

The review system prompt isn’t “find problems.” That’s too vague. It needs structure:

Read the full artifact before commenting. No drive-by critiques on the first section while ignoring the rest.
Group by severity. Critical (security, data loss) → medium (performance, maintainability) → low (style, naming). The author addresses critical first.
Suggest specific fixes. “Move sandbox to Phase 1” is actionable. “Consider sandbox” is a thought, not a review finding.
Check the threat model. Does the solution actually address the stated threat? My plan said it defended against prompt injection. The review tested that claim against the actual attack chain and found it didn’t.
Verify assumptions independently. This is the one that matters most. My planning session assumed Read(~/.ssh/**) blocked all reads of SSH keys. The review session looked up how deny rules actually work and found they only block the Read tool, not Bash subprocesses. It did its own research instead of trusting the plan’s premises.

That last point is what separates a useful review from a rubber stamp. The review session doesn’t trust the plan. It verifies.

A Minimal Review Prompt

You don’t need a 200-line system prompt to get value. Here’s the core:

You are reviewing a technical plan or artifact. Your job is to find
gaps, risks, and incorrect assumptions — not to confirm the approach.

1. Read the entire document before commenting.
2. For each finding, state: what the plan claims, what's actually true,
   and a specific fix.
3. Group findings: critical → medium → low.
4. If the plan makes a claim about how a tool works, verify it yourself
   rather than trusting the claim.
5. If you find zero issues, say so — but only after actively trying to
   break every assumption.

Start there. Expand it as you learn what your planning sessions consistently miss.

When to Use This

Not every plan needs adversarial review. A plan to rename a CSS class doesn’t need a second opinion. But anything with:

Security implications — the cost of a missed gap is high
Architectural decisions — wrong choices compound and are expensive to reverse
Multi-phase rollouts — dependency ordering bugs hide in sequence diagrams
External system integration — assumptions about third-party behavior are often wrong

For these, the 60-second review loop is the highest-ROI minute you can spend. The alternative is discovering the gap in production, which is never just 60 seconds.

The Cost

One extra Claude session per plan. That’s it. The review session is short — it reads a document and produces a list. The planning session was going to iterate on the plan anyway. You’re just giving it better feedback than it can give itself.

Plans that skip adversarial review ship with the author’s blind spots baked in. For security plans, those blind spots are vulnerabilities. For architecture plans, they’re rework. The review loop catches them before they become expensive.