Reflection Mode: Letting AI Act as Its Own Code Reviewer
This article introduces the Reflection mode—a generate‑critique‑refine loop that enables large language models to self‑review and improve generated code, demonstrates a full implementation with Nebius AI Studio and LangGraph, and evaluates the approach with concrete Fibonacci examples and quantitative scoring.
Reflection Mode Overview
Reflection mode splits an LLM’s work into three sequential stages – Generate , Critique , Refine – allowing the model to act as its own code reviewer and produce higher‑quality output from a single‑shot generator.
Workflow
Generate: the model produces an initial draft (e.g., a Python function).
Critique: a second LLM call analyses the draft for bugs, inefficiency, style issues and returns a structured critique.
Refine: the model rewrites the code using the critique suggestions, yielding a final version.
Advantages and Limitations
Advantages : immediate quality boost, lightweight implementation (requires only one LLM), clear modular structure.
Limitations : at least two additional LLM calls increase latency and cost; the process inherits the underlying model’s knowledge biases.
Implementation Steps
Stage 0 – Environment Setup
Install the required Python packages:
# !pip install -q -U langchain-nebius langchain langgraph rich python-dotenvLoad API keys from a .env file and enable LangSmith tracing:
import os, json
from dotenv import load_dotenv
from langchain_nebius import ChatNebius
from langchain import LangGraph
from rich.console import Console
load_dotenv()
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "Reflection Mode (Nebius)"Stage 1 – Core Components
Define three Pydantic models that serve as contracts for each step.
class DraftCode(BaseModel):
"""Initial code generated by the agent."""
code: str = Field(description="Python code solving the user request.")
explanation: str = Field(description="Brief explanation of how the code works.")
class Critique(BaseModel):
"""Structured critique of the draft code."""
has_errors: bool = Field(description="Whether the code contains logical bugs.")
is_efficient: bool = Field(description="Whether the algorithm is efficient.")
suggested_improvements: List[str] = Field(description="Actionable improvement suggestions.")
critique_summary: str = Field(description="One‑sentence summary of the critique.")
class RefinedCode(BaseModel):
"""Optimized code after applying the critique."""
refined_code: str = Field(description="Final improved Python code.")
refinement_summary: str = Field(description="Summary of changes made.")Initialize the LLM (Meta‑Llama‑3.1‑8B‑Instruct) and a rich console for pretty output:
llm = ChatNebius(model="meta-llama/Meta-Llama-3.1-8B-Instruct", temperature=0.2)
console = Console()Generator Node
def generator_node(state):
"""Generate the first draft."""
console.print("--- 1. Generate Draft ---")
generator_llm = llm.with_structured_output(DraftCode)
prompt = f"You are a Python expert. Write a function for the request: {state['user_request']}. Provide clean code and a short explanation."
draft = generator_llm.invoke(prompt)
return {"draft": draft.model_dump()}Critic Node
def critic_node(state):
"""Critique the draft code."""
console.print("--- 2. Critique Draft ---")
critic_llm = llm.with_structured_output(Critique)
code_to_critique = state['draft']['code']
prompt = f"You are a senior code reviewer. Analyze the following Python code for errors, efficiency, and style. Provide a structured critique and concrete suggestions.
```python
{code_to_critique}
```"
critique = critic_llm.invoke(prompt)
return {"critique": critique.model_dump()}Refiner Node
def refiner_node(state):
"""Refine the code based on the critique."""
console.print("--- 3. Refine Code ---")
refiner_llm = llm.with_structured_output(RefinedCode)
draft_code = state['draft']['code']
suggestions = json.dumps(state['critique'], indent=2)
prompt = f"You are a Python expert. Rewrite the code below, fully applying the critique suggestions.
**Original Code:**
```python
{draft_code}
```
**Critique & Suggestions:**
{suggestions}
Return the final code and a short summary of changes."
refined = refiner_llm.invoke(prompt)
return {"refined_code": refined.model_dump()}Stage 2 – Graph Orchestration with LangGraph
Define the shared state that flows between nodes:
class ReflectionState(TypedDict):
user_request: str
draft: Optional[dict]
critique: Optional[dict]
refined_code: Optional[dict]Build a linear graph (generator → critic → refiner → END) and compile it:
graph = StateGraph(ReflectionState)
graph.add_node("generator", generator_node)
graph.add_node("critic", critic_node)
graph.add_node("refiner", refiner_node)
graph.set_entry_point("generator")
graph.add_edge("generator", "critic")
graph.add_edge("critic", "refiner")
graph.add_edge("refiner", END)
reflection_app = graph.compile()
console.print("Reflection graph compiled successfully.")Stage 3 – End‑to‑End Execution and Evaluation
Run the workflow on a concrete task: compute the nth Fibonacci number.
user_request = "Write a Python function that returns the nth Fibonacci number."
initial_input = {"user_request": user_request}
final_state = None
for update in reflection_app.stream(initial_input, stream_mode="values"):
final_state = update
console.print("
✅ Reflection workflow completed!")Print the three stages:
if final_state and all(k in final_state for k in ("draft", "critique", "refined_code")):
console.print(Markdown("--- ### Draft Code ---"))
console.print(Markdown(f"**Explanation:** {final_state['draft']['explanation']}"))
console.print(Syntax(final_state['draft']['code'], "python", theme="monokai", line_numbers=True))
console.print(Markdown("
--- ### Critique ---"))
console.print(Markdown(f"**Summary:** {final_state['critique']['critique_summary']}"))
for imp in final_state['critique']['suggested_improvements']:
console.print(Markdown(f"- {imp}"))
console.print(Markdown("
--- ### Refined Code ---"))
console.print(Markdown(f"**Refinement Summary:** {final_state['refined_code']['refinement_summary']}"))
console.print(Syntax(final_state['refined_code']['refined_code'], "python", theme="monokai", line_numbers=True))
else:
console.print("[bold red]Error: Incomplete final state.[/bold red]")Observed outcome : the draft was a naive recursive implementation with exponential time complexity (O(2ⁿ)). The critic flagged the inefficiency and suggested an iterative approach. The refiner produced a loop‑based solution that runs in linear time (O(n)).
Quantitative Evaluation
A second LLM acts as a neutral judge, scoring correctness, efficiency, and style (1‑10). The draft receives a low efficiency score, while the refined version scores high on all dimensions, confirming the practical benefit of the reflection loop.
class CodeEvaluation(BaseModel):
correctness_score: int = Field(description="Correctness (1‑10).")
efficiency_score: int = Field(description="Efficiency (1‑10).")
style_score: int = Field(description="PEP‑8 style (1‑10).")
justification: str = Field(description="Brief explanation.")
judge_llm = llm.with_structured_output(CodeEvaluation)
def evaluate_code(code):
prompt = f"You are a Python code reviewer. Score the following code on correctness, efficiency, and style (1‑10) and give a short justification.
```python
{code}
```"
return judge_llm.invoke(prompt)Evaluation results show a dramatic efficiency improvement after reflection.
Applicable Scenarios
Code generation : initial AI‑generated code can be self‑reviewed and optimized before delivery.
Complex text summarization : a first‑pass summary can be critiqued for missing details and refined.
Creative writing : drafts can be iteratively improved for tone, clarity, and impact.
⚠️ The extra LLM calls increase latency and cost, and the process cannot invent knowledge beyond the model’s existing capabilities.
Conclusion
The tutorial demonstrates that a simple "generate‑critique‑refine" pipeline built with Nebius AI Studio and LangGraph transforms a basic LLM into a self‑improving agent. The Fibonacci example illustrates a quality leap from an exponential‑time recursive solution to an optimal linear‑time implementation, and quantitative scoring validates the improvement.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data STUDIO
Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
