Why CE’s Agent Design Treats Expert Prompts as Decision Modules, Not Personas
The article explains how many teams instinctively create multiple expert personas for AI agents, but CE instead builds agents as well‑defined judgment modules with clear input and output boundaries, explicit non‑responsibilities, confidence calibration, and systematic orchestration, resulting in a more reliable and maintainable review pipeline.
1. CE’s agents are judgment modules, not personas
Most teams start AI‑agent design by "creating a few experts". CE’s repository, however, defines agents as stable decision modules rather than role‑playing characters.
Key structural elements of a basic review agent
Frontmatter defines name, description, model, and tools.
The body specifies the exact thing the agent hunts for.
Explicit fields for confidence calibration and What you don't flag.
Output must follow a fixed JSON schema.
This structure shows that CE views an agent as a deterministic judgment component, not a simulated expert.
Four essential characteristics of CE’s agents
Clear input boundaries : each reviewer focuses on a narrow problem set (e.g., logical errors, boundary conditions, state‑transition bugs, error‑propagation failures).
Clear output boundaries : agents return structured JSON fields such as findings, residual_risks, and testing_gaps so the result can be consumed by an upstream orchestration pipeline.
Explicit "non‑responsibility" list : the What you don't flag section enumerates what the agent deliberately ignores (style preferences, missing optimizations, naming opinions, unnecessary defensive suggestions).
Confidence calibration : outputs are classified as high, medium, or low confidence, enabling the system to route automatically‑handleable items versus those needing human judgment.
Why confidence matters
Without calibrated confidence the system cannot distinguish between automatically‑resolvable findings and cases that require manual review, breaking downstream routing logic.
2. Splitting reviewers by judgment dimension instead of persona
CE’s agents/review/ directory contains multiple specialized reviewers:
correctness‑reviewer : flags off‑by‑one, null/undefined propagation, race conditions, incorrect state transitions, broken error propagation.
testing‑reviewer : checks new branch coverage, asserts that tests only pass without throwing, detects over‑mocking, missing error paths, and ensures test additions match behavior changes.
maintainability‑reviewer : warns about premature abstraction, unnecessary indirection, dead code, tight coupling, and obscured intent, focusing on future maintenance cost.
adversarial‑reviewer : actively constructs failure scenarios (assumption violations, composition failures, cascade constructions, abuse cases) and varies depth (quick, standard, deep) based on diff size and risk signals.
These reviewers are independent lenses; mixing them would cause interference and dilute signal quality.
3. Orchestration of agents
Agents are not scattered files; they are wired into the ce:review orchestration skill:
Always‑on reviewers
correctness
testing
maintainability
project‑standards
agent‑native‑reviewer
learnings‑researcher
Each diff must answer these questions: logical correctness, test health, maintainability trend, project‑standard compliance, agent‑native accessibility, and relevant historical learnings.
Cross‑cutting conditional reviewers
security‑reviewer
performance‑reviewer
api‑contract‑reviewer
data‑migrations‑reviewer
reliability‑reviewer
adversarial‑reviewer
These are added when the diff touches a specific risk domain.
Stack‑specific conditional reviewers
dhh‑rails‑reviewer
kieran‑rails‑reviewer
kieran‑python‑reviewer
kieran‑typescript‑reviewer
julik‑frontend‑races‑reviewer
These handle language or framework‑specific checks.
4. Research and document‑review agents
Not all agents are code reviewers. Example:
plugins/compound-engineering/agents/research/repo-research-analyst.mdbuilds context by scanning technology, architecture, patterns, documentation, issue conventions, and templates. It operates with scoped invocations such as technology, architecture, patterns, conventions, and issues.
Document‑review agents (e.g., product-lens-reviewer, scope-guardian-reviewer, coherence-reviewer, feasibility-reviewer) focus on plan and documentation quality, asking questions like “right problem?”, “actual outcome?”, “what if we did nothing?”, and “what already exists?”. They illustrate that bad solutions and bad code are distinct failure modes.
5. Design principles distilled from CE
One judgment per agent : separate logical correctness, test quality, maintainability, security, product alignment, etc., to avoid interference.
Explicitly state what the agent does NOT handle : the "What you don't flag" list acts as a noise‑reduction mechanism.
Always perform confidence calibration : without confidence the system cannot route findings correctly.
Output for the system, not just for humans : agents emit structured JSON that downstream skills can merge, deduplicate, and route.
Separate orchestration from expertise : skills decide when, whom, and how many agents to invoke, while agents focus solely on judgment.
CE’s ce-review skill embodies this separation, invoking the appropriate reviewers based on diff characteristics.
Conclusion
Instead of asking "how many expert personas do we need", CE first enumerates the distinct judgment tasks required by the system. By decomposing agents along judgment dimensions, defining clear boundaries, calibrating confidence, and wiring them through a disciplined orchestration layer, the review pipeline becomes predictable, extensible, and less noisy.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
o-ai.tech
I’ll keep you updated with the latest AI news and tech developments in real time—let’s embrace AI together!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
