Reflection Mode: Letting AI Act as Its Own Code Reviewer

This article introduces the Reflection mode—a generate‑critique‑refine loop that enables large language models to self‑review and improve generated code, demonstrates a full implementation with Nebius AI Studio and LangGraph, and evaluates the approach with concrete Fibonacci examples and quantitative scoring.

Data STUDIO
Data STUDIO
Data STUDIO
Reflection Mode: Letting AI Act as Its Own Code Reviewer

Reflection Mode Overview

Reflection mode splits an LLM’s work into three sequential stages – Generate , Critique , Refine – allowing the model to act as its own code reviewer and produce higher‑quality output from a single‑shot generator.

Workflow

Generate: the model produces an initial draft (e.g., a Python function).

Critique: a second LLM call analyses the draft for bugs, inefficiency, style issues and returns a structured critique.

Refine: the model rewrites the code using the critique suggestions, yielding a final version.

Advantages and Limitations

Advantages : immediate quality boost, lightweight implementation (requires only one LLM), clear modular structure.

Limitations : at least two additional LLM calls increase latency and cost; the process inherits the underlying model’s knowledge biases.

Implementation Steps

Stage 0 – Environment Setup

Install the required Python packages:

# !pip install -q -U langchain-nebius langchain langgraph rich python-dotenv

Load API keys from a .env file and enable LangSmith tracing:

import os, json
from dotenv import load_dotenv
from langchain_nebius import ChatNebius
from langchain import LangGraph
from rich.console import Console

load_dotenv()
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "Reflection Mode (Nebius)"

Stage 1 – Core Components

Define three Pydantic models that serve as contracts for each step.

class DraftCode(BaseModel):
    """Initial code generated by the agent."""
    code: str = Field(description="Python code solving the user request.")
    explanation: str = Field(description="Brief explanation of how the code works.")

class Critique(BaseModel):
    """Structured critique of the draft code."""
    has_errors: bool = Field(description="Whether the code contains logical bugs.")
    is_efficient: bool = Field(description="Whether the algorithm is efficient.")
    suggested_improvements: List[str] = Field(description="Actionable improvement suggestions.")
    critique_summary: str = Field(description="One‑sentence summary of the critique.")

class RefinedCode(BaseModel):
    """Optimized code after applying the critique."""
    refined_code: str = Field(description="Final improved Python code.")
    refinement_summary: str = Field(description="Summary of changes made.")

Initialize the LLM (Meta‑Llama‑3.1‑8B‑Instruct) and a rich console for pretty output:

llm = ChatNebius(model="meta-llama/Meta-Llama-3.1-8B-Instruct", temperature=0.2)
console = Console()

Generator Node

def generator_node(state):
    """Generate the first draft."""
    console.print("--- 1. Generate Draft ---")
    generator_llm = llm.with_structured_output(DraftCode)
    prompt = f"You are a Python expert. Write a function for the request: {state['user_request']}. Provide clean code and a short explanation."
    draft = generator_llm.invoke(prompt)
    return {"draft": draft.model_dump()}

Critic Node

def critic_node(state):
    """Critique the draft code."""
    console.print("--- 2. Critique Draft ---")
    critic_llm = llm.with_structured_output(Critique)
    code_to_critique = state['draft']['code']
    prompt = f"You are a senior code reviewer. Analyze the following Python code for errors, efficiency, and style. Provide a structured critique and concrete suggestions.

```python
{code_to_critique}
```"
    critique = critic_llm.invoke(prompt)
    return {"critique": critique.model_dump()}

Refiner Node

def refiner_node(state):
    """Refine the code based on the critique."""
    console.print("--- 3. Refine Code ---")
    refiner_llm = llm.with_structured_output(RefinedCode)
    draft_code = state['draft']['code']
    suggestions = json.dumps(state['critique'], indent=2)
    prompt = f"You are a Python expert. Rewrite the code below, fully applying the critique suggestions.

**Original Code:**
```python
{draft_code}
```

**Critique & Suggestions:**
{suggestions}

Return the final code and a short summary of changes."
    refined = refiner_llm.invoke(prompt)
    return {"refined_code": refined.model_dump()}

Stage 2 – Graph Orchestration with LangGraph

Define the shared state that flows between nodes:

class ReflectionState(TypedDict):
    user_request: str
    draft: Optional[dict]
    critique: Optional[dict]
    refined_code: Optional[dict]

Build a linear graph (generator → critic → refiner → END) and compile it:

graph = StateGraph(ReflectionState)
graph.add_node("generator", generator_node)
graph.add_node("critic", critic_node)
graph.add_node("refiner", refiner_node)
graph.set_entry_point("generator")
graph.add_edge("generator", "critic")
graph.add_edge("critic", "refiner")
graph.add_edge("refiner", END)
reflection_app = graph.compile()
console.print("Reflection graph compiled successfully.")

Stage 3 – End‑to‑End Execution and Evaluation

Run the workflow on a concrete task: compute the nth Fibonacci number.

user_request = "Write a Python function that returns the nth Fibonacci number."
initial_input = {"user_request": user_request}
final_state = None
for update in reflection_app.stream(initial_input, stream_mode="values"):
    final_state = update
console.print("
✅ Reflection workflow completed!")

Print the three stages:

if final_state and all(k in final_state for k in ("draft", "critique", "refined_code")):
    console.print(Markdown("--- ### Draft Code ---"))
    console.print(Markdown(f"**Explanation:** {final_state['draft']['explanation']}"))
    console.print(Syntax(final_state['draft']['code'], "python", theme="monokai", line_numbers=True))

    console.print(Markdown("
--- ### Critique ---"))
    console.print(Markdown(f"**Summary:** {final_state['critique']['critique_summary']}"))
    for imp in final_state['critique']['suggested_improvements']:
        console.print(Markdown(f"- {imp}"))

    console.print(Markdown("
--- ### Refined Code ---"))
    console.print(Markdown(f"**Refinement Summary:** {final_state['refined_code']['refinement_summary']}"))
    console.print(Syntax(final_state['refined_code']['refined_code'], "python", theme="monokai", line_numbers=True))
else:
    console.print("[bold red]Error: Incomplete final state.[/bold red]")

Observed outcome : the draft was a naive recursive implementation with exponential time complexity (O(2ⁿ)). The critic flagged the inefficiency and suggested an iterative approach. The refiner produced a loop‑based solution that runs in linear time (O(n)).

Quantitative Evaluation

A second LLM acts as a neutral judge, scoring correctness, efficiency, and style (1‑10). The draft receives a low efficiency score, while the refined version scores high on all dimensions, confirming the practical benefit of the reflection loop.

class CodeEvaluation(BaseModel):
    correctness_score: int = Field(description="Correctness (1‑10).")
    efficiency_score: int = Field(description="Efficiency (1‑10).")
    style_score: int = Field(description="PEP‑8 style (1‑10).")
    justification: str = Field(description="Brief explanation.")

judge_llm = llm.with_structured_output(CodeEvaluation)

def evaluate_code(code):
    prompt = f"You are a Python code reviewer. Score the following code on correctness, efficiency, and style (1‑10) and give a short justification.

```python
{code}
```"
    return judge_llm.invoke(prompt)

Evaluation results show a dramatic efficiency improvement after reflection.

Applicable Scenarios

Code generation : initial AI‑generated code can be self‑reviewed and optimized before delivery.

Complex text summarization : a first‑pass summary can be critiqued for missing details and refined.

Creative writing : drafts can be iteratively improved for tone, clarity, and impact.

⚠️ The extra LLM calls increase latency and cost, and the process cannot invent knowledge beyond the model’s existing capabilities.

Conclusion

The tutorial demonstrates that a simple "generate‑critique‑refine" pipeline built with Nebius AI Studio and LangGraph transforms a basic LLM into a self‑improving agent. The Fibonacci example illustrates a quality leap from an exponential‑time recursive solution to an optimal linear‑time implementation, and quantitative scoring validates the improvement.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

code generationPythonAI agentsPrompt EngineeringLangGraphReflection modeLLM self‑critique
Data STUDIO
Written by

Data STUDIO

Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.