Enhancing AI Code Review Quality with Contextual Embedding and Function Calling
The article explains how AI code reviews suffer from missing context, and improves them by embedding the codebase, using Retrieval‑Augmented Generation to fetch relevant snippets, and adding a function‑calling tool that lets the model autonomously request additional code, resulting in precise, bug‑detecting feedback.
The article revisits the AI Code Review workflow introduced in a previous note, highlighting a common problem: the review comments sometimes do not match the submitted code changes. This mismatch stems from ChatGPT’s stateless nature, which lacks the full context of the code change and may generate speculative responses.
To improve review quality, the author emphasizes the importance of providing complete code context for each change. Without sufficient context, ChatGPT may make incorrect assumptions, especially when only a function call is submitted without its definition.
The piece then analyzes the effect of context on review outcomes, showing that code blocks with full definitions yield accurate feedback, while isolated calls lead to mismatched comments.
To reliably obtain the necessary context, three approaches are discussed:
Including all files touched by the change, which is token‑inefficient and still may miss function definitions.
Building an index tree with semantic analysis (similar to Go‑to‑definition), which is complex and language‑specific.
Treating the codebase as a knowledge base and using embeddings to retrieve relevant code snippets.
The third approach is implemented by splitting code into chunks (using RecursiveCharacterTextSplitter with chunk_size=500 and chunk_overlap=100 ), embedding each chunk with AzureOpenAIEmbeddings (model text-embedding-ada-002 ), and storing the vectors in a Chroma database.
A test case demonstrates that, given a function TestBanPrefix with a hidden bug, the embedding‑based retrieval successfully finds the function definition among the top‑3 similar chunks, enabling the LLM to flag the bug during review.
While the embedding‑based Retrieval‑Augmented Generation (RAG) improves accuracy, it still performs a vector search for every submitted change, which can be unnecessary when the change already contains sufficient context. To give the model autonomy in deciding when to fetch additional context, the article introduces Function Calling.
By exposing a search_code_relevance tool to the LLM, the model can request a similarity search only when it deems it needed. The tool is defined as follows:
def search_code_relevance(code_chunk):
res = db.similarity_search(code_chunk, k=1)
codes = "\n".join([r.page_content for r in res])
return codes
class SearchCodeRelevanceArgsSchema(BaseModel):
code_chunk: str
search_code_relevance_tool = StructuredTool.from_function(
name="search_code_relevance",
description="Search code relevance. Use this tool when you need to search for code relevance.",
func=search_code_relevance,
args_schema=SearchCodeRelevanceArgsSchema,
)Integrating this tool transforms the AI Code Review system into a Codereview Agent that autonomously decides whether to retrieve additional code context. The agent’s execution trace shows two calls to search_code_relevance , each returning the correct function definition, after which ChatGPT provides precise review comments, including the detection of the hidden bug.
In summary, the article presents a progression from a simple API‑based AI review to a RAG‑enhanced workflow and finally to an autonomous agent powered by Function Calling, offering a practical reference for engineers building their own AI‑assisted code review pipelines.
37 Interactive Technology Team
37 Interactive Technology Center
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.