5 min read

From Code Foundations to AI Agents: A Deep Dive into Code LLMs and Their Applications

This article reviews a comprehensive 303‑page survey on code foundation models, tracing the evolution of code‑focused large language models from 2021 to 2025, comparing general‑purpose and specialized LLMs, and presenting extensive experiments on prompting, fine‑tuning, reinforcement learning, and autonomous coding agents.

PaperAgent

Dec 4, 2025

From Code Foundations to AI Agents: A Deep Dive into Code LLMs and Their Applications

The article reviews a 303‑page survey titled From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence , compiled by 30 leading institutions and industry representatives such as Alibaba, ByteDance, OPPO, FastFun, Kuaishou, Huawei Cloud, and Tencent.

It provides a systematic overview of code‑LLM development from 2021‑2025, highlighting the shift from rule‑based systems to Transformer‑based architectures that have pushed HumanEval success rates from single‑digit percentages to over 95%.

Commercial tools like GitHub Copilot (Microsoft), Cursor (Anysphere), Trae (ByteDance), and Claude Code (Anthropic) illustrate how code LLMs are reshaping software development workflows.

The guide analyses the entire model lifecycle—data management, code pre‑training, supervised fine‑tuning, reinforcement learning, and autonomous coding agents—while detailing advanced prompting paradigms, scaling laws, and safety‑aligned data‑generation pipelines.

It critically compares general‑purpose LLMs (GPT‑4, Claude, LLaMA) with code‑specialized LLMs (StarCoder, Code LLaMA, DeepSeek‑Coder, QwenCoder), discussing design decisions, trade‑offs, and relative capabilities.

A key focus is the research‑practice gap: the article examines discrepancies between benchmark performance (e.g., HumanEval) and real‑world deployment, covering code correctness, security, context awareness in large codebases, and integration with development workflows, and maps promising research directions to industry needs.

Extensive experiments explore scaling laws, framework choices, hyperparameter sensitivity, and compare model architectures such as Kimi‑K2‑Instruct vs Qwen3‑Coder, as well as CodeBERT, CodeT5, and GPT, across various datasets. Visualizations of architecture classifications and training stages are included.

The guide also presents a safety‑aligned data‑generation pipeline, a taxonomy of reinforcement‑learning techniques, and a typical issue‑resolution workflow for coding agents.

Model architecture comparison: Kimi‑K2‑Instruct vs Qwen3‑Coder

Model architectures: CodeBERT, CodeT5, GPT

https://arxiv.org/pdf/2511.18538
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

machine learning AI coding large language models software engineering model evaluation code LLM

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.