Artificial Intelligence 23 min read

Agent Architecture: Planner → Executor → Verifier – Adding a “Quality Inspector” to Your AI

This article introduces the PEV (Planner‑Executor‑Verifier) architecture, explains why AI agents need a verification step to avoid blindly trusting faulty tool outputs, demonstrates a full implementation with LangGraph, compares its robustness to a naïve baseline, and discusses its advantages, limitations, and suitable use cases.

Data STUDIO

Mar 31, 2026

Agent Architecture: Planner → Executor → Verifier – Adding a “Quality Inspector” to Your AI

Why give AI a “quality inspector”?

Many AI agents act like a naïve script: they follow a plan, call an external API, and blindly forward whatever response they receive, even if the API is down or returns garbage. This leads to nonsensical final answers.

What is the PEV architecture?

The PEV (Planner → Executor → Verifier) pattern adds a verification step after each execution, similar to a strict kitchen inspector checking each ingredient before it goes into the dish.

Planner : Decomposes the user request into concrete tool queries, e.g., search('研发支出'), search('员工数').

Executor : Calls the tool for the next step and records the raw result.

Verifier : Checks whether the tool output is valid data or an error message. If it fails, the verifier aborts the current flow and triggers a re‑planning cycle.

Routing & Iteration : Repeats planning, execution, and verification until all steps pass, then synthesises the final answer.

When to use it?

Safety‑critical applications such as finance or healthcare, where a single wrong datum can have huge consequences.

Systems that rely on unstable external APIs (free or beta services).

High‑precision tasks (legal research, scientific analysis) that require factual correctness at every step.

Advantages and limitations

Advantages:

Robustness & reliability – the agent can detect and recover from errors, acting like an “immune system”.

Modular design – planning, execution, and verification are separate, making the code easier to debug and maintain.

Limitations:

Increased latency and cost – each tool call is followed by an extra LLM call for verification, making this the slowest and most expensive pattern discussed.

Verifier design complexity – building a verifier that distinguishes minor glitches from critical failures is non‑trivial.

Hands‑on demo: from “dead‑pan” to self‑repair

We first build a naïve planner‑executor agent, then augment it with a verifier to form a full PEV agent.

Stage 0 – Setup

# Install dependencies
# !pip install -q -U langchain-nebius langchain langgraph rich python-dotenv langchain-tavily

import os, re, json
from typing import List, Annotated, TypedDict, Optional
from dotenv import load_dotenv
from langchain_nebius import ChatNebius
from langchain_tavily import TavilySearch
from langchain_core.messages import BaseMessage, ToolMessage
from pydantic import BaseModel, Field
from langgraph.graph import StateGraph, END
from rich.console import Console
from rich.markdown import Markdown

load_dotenv()
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "Agentic Architecture - PEV (Nebius)"

console = Console()
print("
🚀 Environment ready!")

Stage 1 – Baseline agent (Planner → Executor)

We define a flaky tool that deliberately fails on queries containing “employee count”.

def flaky_web_search(query: str) -> str:
    """Execute a web search but fail on specific queries."""
    console.print(f"🔧 [cyan]Searching: '{query}'...[/cyan]")
    if "employee count" in query.lower():
        console.print("💥 [bold red]Simulated API failure! Endpoint unavailable.[/bold red]")
        return "Error: unable to retrieve data. API endpoint currently unavailable."
    result = TavilySearch(max_results=2).invoke(query)
    if isinstance(result, (dict, list)):
        return json.dumps(result, indent=2)
    return str(result)

class BasicPEState(TypedDict):
    user_request: str
    plan: Optional[List[str]]
    intermediate_steps: List[str]
    final_answer: Optional[str]

class Plan(BaseModel):
    steps: List[str] = Field(description="List of queries to execute.")

# Planner node
def basic_planner_node(state: BasicPEState):
    console.print("📝 [bold green](Basic) Planner: creating plan...[/bold green]")
    planner_llm = llm.with_structured_output(Plan)
    prompt = f"""
    You are a planning agent. Decompose the user request into a JSON list of tool queries.
    - Return only JSON: {{"steps": ["query1", "query2", ...]}}
    - Use the flaky_web_search tool for every query.
    User request: "{state['user_request']}"
    """
    plan = planner_llm.invoke(prompt)
    return {"plan": plan.steps}

# Executor node
def basic_executor_node(state: BasicPEState):
    console.print("⚙️ [bold blue](Basic) Executor: running next step...[/bold blue]")
    next_step = state["plan"][0]
    result = flaky_web_search(next_step)
    return {"plan": state["plan"][1:], "intermediate_steps": state["intermediate_steps"] + [result]}

# Synthesiser node
def basic_synthesizer_node(state: BasicPEState):
    console.print("📄 [bold magenta](Basic) Synthesiser: generating final answer...[/bold magenta]")
    context = "
".join(state["intermediate_steps"])
    prompt = f"Use the following data to answer the original request '{state['user_request']}':
{context}"
    answer = llm.invoke(prompt).content
    return {"final_answer": answer}

# Build the basic graph
pe_graph_builder = StateGraph(BasicPEState)
pe_graph_builder.add_node("plan", basic_planner_node)
pe_graph_builder.add_node("execute", basic_executor_node)
pe_graph_builder.add_node("synthesize", basic_synthesizer_node)
pe_graph_builder.set_entry_point("plan")
pe_graph_builder.add_conditional_edges(
    "plan",
    lambda s: "execute" if s["plan"] else "synthesize",
)
pe_graph_builder.add_conditional_edges(
    "execute",
    lambda s: "execute" if s["plan"] else "synthesize",
)
pe_graph_builder.add_edge("synthesize", END)
basic_pe_app = pe_graph_builder.compile()
print("✅ Basic planner‑executor agent compiled.")

Running the baseline on the two‑step query “Apple’s R&D spend and employee count” shows that the first step succeeds while the second returns an error, which the synthesiser treats as data, producing a meaningless answer.

Stage 2 – Add verifier (PEV)

We introduce a verifier that checks each tool result, clears the plan on failure, and forces re‑planning.

class VerificationResult(BaseModel):
    is_successful: bool = Field(description="True if the tool succeeded and data is valid.")
    reasoning: str = Field(description="Reason for the verification decision.")

class PEVState(TypedDict):
    user_request: str
    plan: Optional[List[str]]
    last_tool_result: Optional[str]
    intermediate_steps: List[str]
    final_answer: Optional[str]
    retries: int

# Planner with retry limit
def pev_planner_node(state: PEVState):
    retries = state.get("retries", 0)
    if retries > 3:
        console.print("🚫 (PEV) Planner: retry limit reached. Stopping.")
        return {"plan": [], "final_answer": "Error: unable to complete task after multiple retries."}
    console.print(f"📝 (PEV) Planner: creating/modifying plan (retry {retries})...")
    planner_llm = llm.with_structured_output(Plan, strict=True)
    past_context = "
".join(state["intermediate_steps"])
    base_prompt = f"""
    You are a planning agent. Create a plan to answer: '{state['user_request']}'.
    Use only the flaky_web_search tool. Return JSON {{"steps": ["query1", "query2"]}}.
    Do not repeat failed queries. Max 5 steps.
    Past attempts and results:
{past_context}
    """
    plan = planner_llm.invoke(base_prompt)
    return {"plan": plan.steps, "retries": retries + 1}

# Executor node (stores raw result)
def pev_executor_node(state: PEVState):
    if not state.get("plan"):
        console.print("⚠️ (PEV) Executor: no remaining steps.")
        return {}
    console.print("⚙️ (PEV) Executor: running next step...")
    next_step = state["plan"][0]
    result = flaky_web_search(next_step)
    return {"plan": state["plan"][1:], "last_tool_result": result}

# Verifier node
def verifier_node(state: PEVState):
    console.print("🔍 (PEV) Verifier: checking last tool result...")
    verifier_llm = llm.with_structured_output(VerificationResult)
    prompt = f"""
    Verify whether the following tool output is a successful result or an error message.
    Task: '{state['user_request']}'.
    Tool output: '{state['last_tool_result']}'
    """
    verification = verifier_llm.invoke(prompt)
    console.print(f"✅ Verifier: judged as '{'success' if verification.is_successful else 'failure'}'. Reason: {verification.reasoning}")
    if verification.is_successful:
        return {"intermediate_steps": state["intermediate_steps"] + [state["last_tool_result"]]}
    else:
        return {"plan": [], "intermediate_steps": state["intermediate_steps"] + [f"Verification failed: {state['last_tool_result']}"]}

# Router decides next node
def pev_router(state: PEVState):
    if state.get("final_answer"):
        console.print("✅ Router: final answer ready, go to synthesiser.")
        return "synthesize"
    if not state["plan"]:
        if state["intermediate_steps"] and "Verification failed" in state["intermediate_steps"][-1]:
            console.print("🔄 Router: verification failed, back to planner.")
            return "plan"
        console.print("✅ Router: plan completed, go to synthesiser.")
        return "synthesize"
    console.print("➡️ Router: more steps remain, continue execution.")
    return "execute"

# Synthesiser (reuse basic one)
def pev_synthesizer_node(state: PEVState):
    console.print("📄 (PEV) Synthesiser: generating final answer...")
    context = "
".join(state["intermediate_steps"])
    prompt = f"Use the following data to answer the original request '{state['user_request']}':
{context}"
    answer = llm.invoke(prompt).content
    return {"final_answer": answer}

# Build PEV graph
pev_graph_builder = StateGraph(PEVState)
pev_graph_builder.add_node("plan", pev_planner_node)
pev_graph_builder.add_node("execute", pev_executor_node)
pev_graph_builder.add_node("verify", verifier_node)
pev_graph_builder.add_node("synthesize", pev_synthesizer_node)
pev_graph_builder.set_entry_point("plan")
pev_graph_builder.add_edge("plan", "execute")
pev_graph_builder.add_edge("execute", "verify")
pev_graph_builder.add_conditional_edges("verify", pev_router)
pev_graph_builder.add_edge("synthesize", END)
pev_agent_app = pev_graph_builder.compile()
print("✅ Planner‑Executor‑Verifier (PEV) agent compiled.")

Running the PEV agent on the same query shows the verifier catching the error, the router sending control back to the planner, a revised plan that avoids the failing query, successful execution, and a correct final answer.

Stage 3 – Result discussion

First plan: e.g., ["Apple R&D spend", "Apple employee count"].

Execution & verification (first round): R&D query succeeds, employee count fails, verifier flags failure.

Routing triggers re‑planning.

Second plan: uses an alternative query such as "Apple global employee number 2023" to bypass the flaky endpoint.

Second execution succeeds, verifier approves, synthesiser computes per‑employee R&D spend and returns the correct answer.

Stage 4 – Quantitative evaluation

We let an LLM act as a judge to score task completion and error‑handling for both agents.

class RobustnessEvaluation(BaseModel):
    """Score the robustness and error‑handling of an agent."""
    task_completion_score: int = Field(description="1‑10, whether the task was completed ignoring data errors.")
    error_handling_score: int = Field(description="1‑10, how well the agent detected and recovered from errors.")
    justification: str = Field(description="Brief reason for the scores.")

judge_llm = llm.with_structured_output(RobustnessEvaluation)

def evaluate_agent_robustness(query: str, final_state: dict):
    context = "
".join(final_state.get("intermediate_steps", []))
    final_answer = final_state.get("final_answer", "")
    trace = f"Context:
{context}

Final answer:
{final_answer}"
    prompt = f"""
    You are an expert evaluator of AI agents. Score the agent on:
    - Task completion (1‑10)
    - Error handling (1‑10)
    Provide a short justification.
    User task: {query}
    Agent trace:
{trace}
    """
    return judge_llm.invoke(prompt)

# Evaluate baseline
pe_agent_evaluation = evaluate_agent_robustness(flaky_query, final_pe_output)
print(pe_agent_evaluation.model_dump())
# Evaluate PEV
pev_agent_evaluation = evaluate_agent_robustness(flaky_query, final_pev_output)
print(pev_agent_evaluation.model_dump())

The baseline receives a very low error‑handling score because it never detects the failure. The PEV agent scores near‑perfect, demonstrating its ability to recognise errors, re‑plan, and still complete the task.

Key take‑aways Principle: PEV adds a self‑correction loop to AI agents. Practice: Using LangGraph we built a robust agent that recovers from tool failures. Avoidance: The extra verification step adds latency and cost, so it fits safety‑critical scenarios where reliability outweighs resource usage.

Building truly reliable AI systems requires more than shining performance on ideal inputs; they must stay stable when external services falter. The PEV architecture provides that protective “umbrella”.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI agents LLM Robustness verification LangGraph PEV

Written by

Data STUDIO

Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Why give AI a “quality inspector”?

What is the PEV architecture?

When to use it?

Advantages and limitations

Hands‑on demo: from “dead‑pan” to self‑repair

Stage 0 – Setup

Stage 1 – Baseline agent (Planner → Executor)

Stage 2 – Add verifier (PEV)

Stage 3 – Result discussion

Stage 4 – Quantitative evaluation

Data STUDIO

How this landed with the community

Was this worth your time?

0 Comments

Stage 0 – Setup

Stage 1 – Baseline agent (Planner → Executor)

Stage 2 – Add verifier (PEV)

Stage 3 – Result discussion

Stage 4 – Quantitative evaluation