How Fine‑Tuning Large Models Solves Code Upgrade Challenges and Boosts Stable Module Matching
This article details an innovative approach that uses large‑model supervised fine‑tuning to overcome the instability of code RAG and code agents during open‑source repository upgrades, addressing domain‑specific terminology, code style differences, and improving recall, accuracy, and deployment efficiency.
Project Background
During a major version upgrade of an open‑source repository, the Gaode terminal technology team faced excessive time consumption due to the large codebase and insufficient low‑version experience. Existing internal code platforms could not meet custom Q&A needs, and external tools raised security concerns, prompting the development of a research‑efficiency tool based on code RAG and code agents.
Problems with Existing Approaches
Code RAG stores code knowledge graphs in a vector database and retrieves fragments, but suffers from low recall and accuracy, unstable LLM outputs, and performance degradation when prompts become too long.
Code Agent adds reasoning steps to improve retrieval, yet iterative thinking can produce uncertain results and depends heavily on the underlying code query tool.
Additional challenges include domain‑specific jargon and mismatched code styles, which hinder direct mapping between user requirements and code snippets.
Proposed Solution: Supervised Fine‑Tuning (SFT)
The team reframed the task as matching user requirements to relevant code modules rather than individual snippets. By fine‑tuning a large model on a domain‑specific code knowledge graph, the model learns to understand module‑level semantics, ensuring more stable and accurate code association.
Fine‑Tuning Preparation
Key considerations for the base model selection were task complexity, deployment ease (preferably on‑device), and limited training resources (single GPU with 16 GB memory). Qwen‑3‑4B was chosen for its strong semantic understanding, small parameter count, and support for Chain‑of‑Thought (CoT) reasoning.
The unsloth framework was selected for efficient PEFT (LoRA) training, allowing rapid adaptation on limited hardware.
Dataset Construction
A structured dataset was built from the full repository knowledge graph, extracting module nodes, inter‑module relationships, and functional attributes. Each training example includes a prompt describing the user requirement, a CoT reasoning field, and a label containing the target module ID.
Transform raw code retrieval into module‑level learning to simplify the task.
Generate high‑quality training/validation data by parsing the knowledge graph.
Control model output to conform to predefined module ID formats.
Fine‑Tuning Process
LoRA was applied via FastLanguageModel.get_peft_model with parameters tuned for task complexity, available memory, data volume, and text length. Training used a 1:1 mix of standard and CoT examples, and hyper‑parameters were logged in the accompanying figures.
After training, the LoRA weights were merged into the full model.
Experimental Results
On the test set the fine‑tuned model achieved an overall accuracy of 78%, meeting the expected performance target.
Edge‑Side Deployment
The final model was deployed on macOS using the Metal Performance Shaders (MPS) backend, enabling inference in seconds on a single device. Sample inference code and performance screenshots are included.
Conclusion
Supervised fine‑tuning enables large models to quickly adapt to domain‑specific terminology and code styles, delivering more accurate, stable, and low‑cost solutions that can run on edge devices. Future work will explore reinforcement learning to further enhance vertical domain performance.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
