AdamsReview: Multi-Agent PR Reviews for Claude Code, Reviewed

The pull request review is where a lot of AI code tools stop being useful. You ask one model to read a 40-file diff, it returns six surface comments — formatting nits, an obvious null check, a TODO it spotted — and misses the race condition that ships to production on Friday. AdamsReview, an open-source project from Adam J. G. Miller, takes a different swing at the problem: instead of one model passing once over the diff, it orchestrates several Claude Code agents that each look at the change through a different lens, then consolidates their output into a single review.

Why single-pass LLM reviews leave bugs on the table

Single-agent review has three failure modes you can reproduce on almost any non-trivial PR.

First, attention dilution. When a 2,000-line diff lands in one prompt, the model spreads its attention thin. The first few files get genuine engagement; by the time the model is reading the last test file, it is mostly pattern-matching. Multi-agent setups sidestep this by giving each agent a smaller surface area or a narrower question to answer.

Second, no adversarial perspective. A single pass tends to validate the change rather than attack it. The model reads the diff as written and asks “is this consistent with itself?” rather than “what is the worst input I can think of that breaks this?” You can prompt your way around it, but the moment you ask one agent to do both — write a constructive review and try to break the code — the adversarial half loses.

Third, no cross-checking. If the model hallucinates a function signature or misreads what a helper returns, there is nothing in the loop to catch it. A reviewer who has been through a thousand PRs knows that the second pair of eyes is what catches the embarrassing miss. Multi-agent review approximates that by having one agent’s claims be visible to another.

How AdamsReview splits the work across agents

AdamsReview is open-source on GitHub at adamjgmiller/adamsreview and is built to run on top of Claude Code — Anthropic’s CLI/agent runtime — rather than calling the API directly. That choice matters: it means each agent in the review has the same tooling a developer running Claude Code locally would have, including file reads, command execution, and access to whatever MCP servers you have wired into your environment.

The orchestration pattern is the part to pay attention to. Rather than one prompt that says “review this PR,” the tool dispatches multiple agents in parallel, each scoped to a specific concern. From the project’s framing, those concerns are the kinds of review angles you would brief a human reviewer on — correctness, security exposure, test coverage, performance — handled as separate workers whose outputs are then merged. The merge step is what turns “five agents wrote five reports” into one comment thread you can actually act on.

A few practical consequences:

You can run it where Claude Code already runs. If you are scripting Claude Code in CI or invoking it from a developer machine for pre-commit review, adamsreview slots into the same pipeline rather than asking you to adopt a new platform.
Cost scales with agents, not with PR size alone. Five focused agents over a 200-line diff is going to cost more than a single agent over the same diff. The win has to be measured against that.
Configuration shapes the output. Which agents run, what each one is told to focus on, and how their findings are merged are the levers that determine whether the review reads as signal or noise.

We have not benchmarked adamsreview against single-agent review on a controlled corpus, and you should be skeptical of anyone who quotes a “catches 4× more bugs” number without publishing the test set. The honest claim is the structural one: more agents with narrower briefs surface findings that a single broad-brief agent suppresses. Whether those findings are the ones you care about depends on your codebase.

Cursor

If you want an in-editor AI pair that complements PR-time multi-agent review, Cursor handles the moment-of-writing side of the same workflow.

Free tier; Pro $20/mo

Try Cursor

Affiliate link · We earn a commission at no cost to you.

Where it fits (and doesn’t) in your review pipeline

The strongest case for adding adamsreview to your workflow is the PR that is too large to read carefully but too small to justify a meeting. A 600-line refactor that touches one service, has tests, and is “obviously fine” — that is exactly the kind of change where one human reviewer skims, one AI reviewer rubber-stamps, and a regression slips through. Splitting the review into focused agents raises the floor on what gets caught.

The weakest case is the one-line config change or the three-file dependency bump. The orchestration overhead is real; running five agents on a five-line PR is paying for ceremony you do not need. Use a single fast model, or just merge it.

You should also think about where multi-agent review sits relative to a human reviewer, not as a replacement for one. The pattern that holds up is: AI review runs first and surfaces the mechanical findings, leaving the human reviewer free to focus on architecture, naming, and whether the change should exist at all. If your team treats AI review as the only review, you will eventually eat a bug that no orchestration pattern would have caught — because the bug was a product decision, not a code one.

A few notes on running it well

Some practical defaults worth setting before you wire adamsreview into a team workflow:

Gate it on PR size. Run the full orchestration on diffs above some threshold (e.g., 200 lines changed) and a single-agent review below it.
Cache aggressively. If your Claude Code setup supports prompt caching for repository context, multi-agent review is where that pays off most — every agent is reading the same code.
Treat the output as a checklist, not a verdict. The value is the surfacing; the judgment is still yours.
Log token spend per PR. The first time you hit a $5 review on a 50-line PR, you will want to know why.

FAQ

Does AdamsReview replace human code review? +

No. It raises the floor on what mechanical, security, and test-coverage issues get caught before a human reviewer sees the PR. The judgment calls — should this feature exist, is this the right abstraction, is the naming clear — still need a human. The realistic framing is that it does the work a tired reviewer would do badly, so the alert reviewer can focus on the work only they can do.

How is this different from GitHub's built-in Copilot PR review? +

GitHub's Copilot review is a hosted, single-agent pass that runs inside the PR UI. AdamsReview is open-source, runs on Claude Code, and uses multiple agents with different briefs. The trade-off is control versus convenience: Copilot is one-click and limited; adamsreview is more setup and more configurable.

What does it cost to run on a typical PR? +

Cost scales with the number of agents, the size of the diff, and how much repository context each agent loads. Expect it to be meaningfully more expensive than a single-agent review on the same PR — that is the explicit trade. The way to keep it sane is to gate it on PR size and use prompt caching for repo context where your setup allows.