Artificial Intelligence 13 min read

AutoGen Multi‑Agent Demo: Coder, Reviewer, and Executor Automatically Complete a Code Review

The article explains how Microsoft’s AutoGen framework enables a Planner‑Executor‑Critic loop and a three‑agent GroupChat workflow, providing step‑by‑step Python code that configures AssistantAgent, UserProxyAgent, and ReviewerAgent to generate, review, and execute code automatically, and discusses the system’s advantages, scalability, and real‑world deployments.

DeepHub IMBA

May 28, 2026

AutoGen Multi‑Agent Demo: Coder, Reviewer, and Executor Automatically Complete a Code Review

What is a Multi‑Agent System

A Multi‑Agent System (MAS) is a computing architecture in which multiple autonomous agents interact, negotiate, and collaborate to solve problems that are too complex or inefficient for a single agent. Each agent perceives its environment, maintains internal state, and takes goal‑directed actions. In practice, several LLM instances—or hybrids of models and tools—communicate via structured messages, forming a virtual team where one agent writes code, another reviews it, a third retrieves background information, and a manager coordinates the overall process.

Why use AutoGen

AutoGen, developed by Microsoft Research, is a mature, production‑ready Python framework for building multi‑agent dialogue systems. It abstracts inter‑agent communication, tool invocation, and conversation‑history management while preserving a high degree of customisation.

ConversableAgent : Base class for all agents; supports LLM back‑ends, human proxies, and tool calls.

AssistantAgent : Pre‑configured LLM agent that follows instructions and writes code.

UserProxyAgent : Human‑in‑the‑loop or fully autonomous executor that runs code locally.

GroupChat & GroupChatManager : Coordinate round‑table discussions among multiple agents; speaker‑selection logic is configurable.

Architecture overview

The production‑grade pattern is the Planner‑Executor‑Critic (PE‑C) loop, which forms the backbone of most large‑scale multi‑agent deployments.

Related tools and frameworks

LangChain / LangGraph : Graph‑based agent orchestration with built‑in state machines, suitable for cyclic workflows.

CrewAI : Higher‑level abstraction on top of AutoGen and LangChain for role‑based agent teams.

Semantic Kernel : Microsoft SDK for integrating LLMs with memory, skills, and plugins in C# and Python.

OpenAI Swarm : Lightweight experimental framework for exploring task hand‑off between agents.

Full code example

Installation: pip install pyautogen matplotlib pandas Step 1 – Configure the LLM backend (OpenAI, Azure, Anthropic, or local models). Environment variables keep API keys out of the code.

import autogen
import os

# -------------------------------------------------------
# Step 1: LLM backend configuration
# -------------------------------------------------------
config_list = [
    {
        "model": "gpt-4o",
        "api_key": os.environ.get("OPENAI_API_KEY"),
    }
]

llm_config = {
    "config_list": config_list,
    "temperature": 0.2,   # lower temperature = more deterministic code
    "cache_seed": 42,      # reproducible results for debugging
}

# -------------------------------------------------------
# Step 2: Define AssistantAgent (the coder)
# -------------------------------------------------------
assistant = autogen.AssistantAgent(
    name="DataAnalystAgent",
    llm_config=llm_config,
    system_message="""
    You are an expert Python data analyst.
    When given a data task, write clean, well‑commented Python code.
    Always save any generated charts as PNG files.
    Respond with ONLY the code block. Do not explain unless asked.
    When the task is complete, reply with TERMINATE.
    """,
)

# -------------------------------------------------------
# Step 3: Define UserProxyAgent (the executor)
# -------------------------------------------------------
user_proxy = autogen.UserProxyAgent(
    name="ExecutorAgent",
    human_input_mode="NEVER",            # fully autonomous execution
    max_consecutive_auto_reply=10,        # safety cap on auto replies
    is_termination_msg=lambda msg: "TERMINATE" in msg.get("content", ""),
    code_execution_config={
        "work_dir": "agent_workspace",   # run code in this directory
        "use_docker": False,             # set True in production for isolation
    },
    llm_config=False,                     # executor does not use an LLM
)

# -------------------------------------------------------
# Step 4: Issue a concrete task to start the dialogue
# -------------------------------------------------------
task = """
Generate a synthetic sales dataset with 100 rows containing:
- 'month' (Jan to Dec, repeated)
- 'product' (randomly from ['Widget A', 'Widget B', 'Widget C'])
- 'revenue' (random float between 1000 and 50000)
- 'units_sold' (random int between 10 and 500)

Then:
1. Compute total revenue and average units sold per product.
2. Plot a bar chart of total revenue by product.
3. Save the chart as 'revenue_by_product.png'.
4. Print the summary statistics to the console.
"""

user_proxy.initiate_chat(
    recipient=assistant,
    message=task,
    clear_history=True,
)

print("Agent task completed. Check agent_workspace/ for output files.")

When initiate_chat is called, the following sequence unfolds: ExecutorAgent sends the task string to DataAnalystAgent. DataAnalystAgent forwards the task and its system prompt to the LLM, which returns a Python code block. ExecutorAgent extracts the code block and runs it in the agent_workspace directory.

Standard output and any errors are captured and sent back to DataAnalystAgent as the next message.

If the code fails, DataAnalystAgent self‑corrects and resubmits; the loop continues until execution succeeds or the max_consecutive_auto_reply limit is reached.

Upon successful execution, DataAnalystAgent replies with TERMINATE, ending the round.

Extending to a three‑agent GroupChat

# Add a ReviewerAgent to review code before execution
reviewer = autogen.AssistantAgent(
    name="ReviewerAgent",
    llm_config=llm_config,
    system_message="""
    You are a senior Python code reviewer.
    Review code for correctness, efficiency, and security.
    If the code is acceptable, reply: 'APPROVED: proceed.'
    If not, explain what to fix clearly and concisely.
    """,
)

# GroupChat lets three agents converse around a virtual table
group_chat = autogen.GroupChat(
    agents=[user_proxy, assistant, reviewer],
    messages=[],
    max_round=15,
    speaker_selection_method="auto",  # AutoGen picks the next speaker based on context
)

manager = autogen.GroupChatManager(
    groupchat=group_chat,
    llm_config=llm_config,
)

# Launch the group workflow
user_proxy.initiate_chat(
    recipient=manager,
    message=task,
    clear_history=True,
)

In this configuration the workflow becomes Coder → Reviewer → Executor, mirroring a real engineering team’s code‑review pipeline and running fully automatically.

Advantages of AutoGen‑based multi‑agent systems

Task decomposition & specialization : Complex workflows are split among role‑focused agents, yielding higher output quality than a single‑agent prompt.

Self‑correction loop : Agents iteratively review and revise each other’s outputs, reducing errors without human intervention.

Scalability : New agents can be added without refactoring existing structure; the manager dynamically routes messages.

Backend flexibility : AutoGen works with OpenAI, Azure OpenAI, Anthropic, Mistral, and local models via LiteLLM, avoiding vendor lock‑in.

Integrated code execution : Native support for running, testing, and debugging code within the agent loop is a key differentiator for engineering tasks.

Human‑in‑the‑loop support : Any agent can be switched to interactive mode, allowing expert judgement at arbitrary decision points.

Active open‑source community : A large, frequently updated GitHub community backs AutoGen, and it is adopted by companies in finance, healthcare, and retail.

Auditability : Complete conversation histories are recorded, making the decision process transparent compared to black‑box single‑model outputs.

Conclusion

Multi‑agent systems represent an architectural evolution for solving complex AI problems. By distributing cognitive capabilities across a team of specialised, collaborative agents rather than relying on a monolithic model, developers gain modularity, self‑correction, scalability, and explainability within a single workflow. The demonstrated patterns—from a two‑agent code‑execution loop to a three‑agent GroupChat review pipeline—are production‑ready architectures already deployed at scale in finance, healthcare, and retail.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python LLM multi-agent systems AutoGen GroupChat code review automation Planner-Executor-Critic

Written by

DeepHub IMBA

A must‑follow public account sharing practical AI insights. Follow now. internet + machine learning + big data + architecture = IMBA

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.