19 min read

Mastering Multi‑Agent Systems: Design, Parallel Execution, and Interview Strategies

This article dissects the shortcomings of single‑agent LLM pipelines, introduces the Supervisor‑based Multi‑Agent architecture with LangGraph, demonstrates parallel task execution, robust error handling, and result merging, and provides concrete interview guidance backed by real performance data.

Wu Shixiong's Large Model Academy

Mar 28, 2026

Mastering Multi‑Agent Systems: Design, Parallel Execution, and Interview Strategies

Why Multi‑Agent? The Limits of a Single Agent

Single agents struggle with long context loss, inability to parallelize independent tasks, and sub‑optimal performance across diverse subtasks, leading to degraded accuracy and slow response times in complex workflows such as bank loan risk assessment.

Supervisor Mode: Core Architecture

The most widely used pattern in production is the Supervisor mode, where a central Supervisor Agent decomposes tasks and coordinates multiple specialized sub‑agents, with a WriterAgent aggregating final results.

Implementation with LangGraph defines a typed state and a routing function that decides the next sub‑agent based on completed results.

from langgraph.graph import StateGraph, END
from typing import TypedDict, Literal

class SupervisorState(TypedDict):
    task: str
    subtasks: list[str]
    agent_results: dict
    final_report: str

def supervisor_node(state: SupervisorState) -> SupervisorState:
    """Supervisor: task decomposition and allocation"""
    task = state["task"]
    subtasks = decompose_task(task)
    return {**state, "subtasks": subtasks}

def route_to_agents(state: SupervisorState) -> Literal["research", "analysis", "writer", "end"]:
    if not state.get("agent_results", {}).get("research"):
        return "research"
    elif not state.get("agent_results", {}).get("analysis"):
        return "analysis"
    elif not state.get("final_report"):
        return "writer"
    else:
        return "end"

Parallel Execution: Running ResearchAgent Simultaneously

Independent dimensions such as corporate registration, industry competition, and macro policy can be fetched in parallel using asynchronous calls.

import asyncio
async def parallel_research(queries: list[str]) -> list[dict]:
    """Execute multiple research sub‑tasks concurrently"""
    tasks = [research_agent.ainvoke({"query": q}) for q in queries]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    final_results = []
    for i, result in enumerate(results):
        if isinstance(result, Exception):
            print(f"Sub‑task {i} failed: {result}, using fallback")
            final_results.append({"query": queries[i], "result": "Retrieval failed, skipped"})
        else:
            final_results.append(result)
    return final_results

Using asyncio.gather(..., return_exceptions=True) prevents a single failure from aborting the whole workflow, and ainvoke ensures true concurrency.

Agent Error Handling: Retry + Fallback

Each sub‑agent employs a two‑layer strategy: limited retries followed by graceful degradation.

Agent error handling: retry and fallback strategy

import asyncio
from typing import Optional

async def execute_agent_with_retry(agent, input_data: dict, agent_name: str, max_retries: int = 2, retry_delay: float = 1.0) -> dict:
    """Execute an agent with retries and fallback.
    - max_retries: number of additional attempts (excluding the first)
    - retry_delay: seconds to wait before each retry
    """
    last_exception: Optional[Exception] = None
    for attempt in range(max_retries + 1):
        try:
            result = await agent.ainvoke(input_data)
            return {"status": "success", "data": result, "agent": agent_name}
        except Exception as e:
            last_exception = e
            if attempt < max_retries:
                print(f"[{agent_name}] Attempt {attempt + 1} failed: {e}, retrying after {retry_delay}s...")
                await asyncio.sleep(retry_delay)
            else:
                print(f"[{agent_name}] Retries exhausted, triggering fallback")
    return {"status": "degraded", "data": None, "agent": agent_name, "error": str(last_exception), "message": f"{agent_name} data retrieval failed"}

The Supervisor checks for degraded agents and records missing data for the WriterAgent to annotate.

def supervisor_node(state: SupervisorState) -> SupervisorState:
    agent_results = state.get("agent_results", {})
    degraded_agents = [name for name, result in agent_results.items() if isinstance(result, dict) and result.get("status") == "degraded"]
    if degraded_agents:
        print(f"Warning: data missing from agents {degraded_agents}")
        return {**state, "data_gaps": degraded_agents}
    return state

Result Merging: How WriterAgent Handles Conflicts

When research and analysis produce contradictory conclusions, the WriterAgent follows an explicit rule: prioritize analytical (financial) data over retrieved news.

def writer_node(state: SupervisorState) -> SupervisorState:
    """WriterAgent: merge sub‑agent outputs into a final report"""
    research_result = state["agent_results"].get("research", {})
    analysis_result = state["agent_results"].get("analysis", {})
    data_gaps = state.get("data_gaps", [])
    gap_notice = ""
    if data_gaps:
        gap_notice = f"

Note: the following modules lack data: {', '.join(data_gaps)}"
    merge_prompt = f"""
You are a report‑generation expert. Combine the following sections into a structured report:
## Research Findings
{research_result.get('data', 'Data missing')}
## Data Analysis
{analysis_result.get('data', 'Data missing')}
{gap_notice}
Requirements:
1. Deduplicate content, keep key information.
2. If conclusions conflict, prefer the Data Analysis section.
3. Explicitly flag any disagreement: "Research and analysis differ; further verification needed."
4. Provide an executive summary (3‑5 bullet points).
5. Mark missing sections as "[Data missing, for reference only]".
"""
    final_report = llm.invoke(merge_prompt).content
    return {**state, "final_report": final_report}

Single Agent vs Multi‑Agent: Real‑World Metrics

Single Agent vs Multi-Agent performance comparison

Total Time : Single Agent average 45 s, Multi‑Agent average 18 s (≈60 % faster).

Context Length : Single Agent exceeds 32 k tokens, Multi‑Agent keeps each sub‑agent under 8 k tokens.

Report Quality Score : Human evaluation 72 vs 89 (≈24 % improvement).

Applicable Scenarios : Single Agent is faster for simple single‑dimension queries; Multi‑Agent shines in multi‑dimensional, complex analyses.

Parallelism beyond 3‑5 agents yields diminishing returns due to API rate limits; 2‑3 concurrent agents provide the best trade‑off.

How to Answer Multi‑Agent Questions in Interviews

Motivation: Explain why a single LLM pipeline fails for long contexts and slow response.

Design: Describe the Supervisor pattern, the roles of each sub‑agent, and the central decision‑making.

Robustness: Highlight the retry‑plus‑fallback mechanism and how degraded agents are handled.

Result Merging: Show the explicit priority rule and how contradictions are flagged.

Limitations: Mention the optimal parallelism range and the need to respect strong sequential dependencies.

Conclusion

True Multi‑Agent systems go beyond chaining LLM calls; they require thoughtful task decomposition, dynamic scheduling, resilient error handling, and deterministic result merging. The Supervisor architecture, combined with parallel execution and explicit merging rules, delivers faster, more accurate, and production‑ready solutions for complex AI workflows.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM error handling multi-agent AI architecture Parallel Execution LangGraph

Written by

Wu Shixiong's Large Model Academy

We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.