Artificial Intelligence 11 min read

Boosting Codebase Upgrades with Code RAG and Agent‑Driven Fine‑Tuning

This article describes how the Gaode terminal team tackled large‑scale repository upgrades by building a code‑RAG and code‑Agent tool, addressing recall and stability issues, then fine‑tuning a small LLM (Qwen3‑4B) with LoRA and custom datasets to achieve reliable, low‑cost, on‑device code‑query performance.

Amap Tech

Aug 7, 2025

Boosting Codebase Upgrades with Code RAG and Agent‑Driven Fine‑Tuning

Project Background

During a major version upgrade of an open‑source repository, the Gaode terminal team found that the sheer volume of code changes made manual knowledge transfer impractical, and existing internal tools could not meet the customized Q&A needs while external tools raised security concerns.

Problems with code RAG : Storing code knowledge graphs in a vector database and retrieving via RAG suffered from low recall, unstable LLM outputs, and performance degradation when prompts became too long.

Problems with code Agent : Adding an AI Agent improved accuracy but introduced instability when iterative reasoning depended on the quality of the underlying code‑search tool.

Additional challenges included domain‑specific terminology, module‑level code styles, and the need for a model that could understand user requirements in relation to repository modules.

Code Demand Understanding

The goal is to match user requirements to repository modules rather than isolated code snippets, because modules encapsulate stable business architecture and align with natural‑language descriptions.

Transform repository‑wide code retrieval into learning over “code module” knowledge graphs, simplifying the task and enabling low‑parameter models to achieve accurate inference.

Generate high‑quality training/validation data from the full‑repository knowledge graph instead of raw code.

Restrict model outputs to valid module IDs to improve interpretability.

Fine‑Tuning Preparation

The fine‑tuning pipeline includes selecting a base model, a framework, and preparing data.

Base model selection : Qwen3‑4B was chosen for its small size, strong semantic understanding, suitability for on‑device deployment, and support for Chain‑of‑Thought (CoT) reasoning.

Framework selection : The unsloth high‑performance training framework was adopted for its efficient PEFT support.

Dataset Construction

Using the repository’s code knowledge graph, nodes (module IDs), edges (module relationships), and attributes (function descriptions) were extracted to build a structured dataset. Each entry contains a prompt, a label with the reasoning process, and the target module ID.

Data augmentation techniques such as creating multiple module description variants, enriching prompts, synonym replacement, and sentence restructuring were applied to increase diversity and prevent over‑fitting.

Fine‑Tuning Strategy

1. LoRA fine‑tuning : Low‑Rank Adaptation was used to freeze the pre‑trained weights and train only low‑rank matrices, reducing memory and compute requirements while maintaining performance.

Key configuration factors considered: task complexity, available GPU memory, dataset size, and required text length.

2. SFT training parameters : Detailed hyper‑parameters (learning rate, batch size, epochs, etc.) were set according to the resource constraints of a single 16 GB GPU.

After training, the LoRA adapters were merged into the base model to produce a complete fine‑tuned checkpoint.

Engineering Handling

Post‑processing ensured that model outputs conformed to the predefined module‑ID format and performed data cleaning to remove invalid tokens.

Experimental Results

On the test set the fine‑tuned model achieved an overall accuracy of 78%, meeting the expected performance target.

Edge Deployment

The final model was deployed on macOS using the Metal Performance Shaders (MPS) backend, enabling inference in seconds on a CPU or GPU. Sample inference code and performance screenshots are provided.

Conclusion

Fine‑tuning a small LLM with domain‑specific knowledge graphs and LoRA allows rapid adaptation to vertical code‑query tasks, delivering accurate, stable responses at low cost and supporting on‑device deployment. Remaining challenges such as dataset diversity and gradient convergence were mitigated through combined algorithmic and engineering solutions, and future work will explore reinforcement‑learning enhancements.

LLM LoRA model fine-tuning Knowledge Graph Code Agent code RAG on‑device deployment

Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.