Artificial Intelligence 26 min read

How LLMs Can Auto-Generate Unit Tests: Insights from ByteDance’s QCon Talk

This article summarizes ByteDance’s quality‑efficiency expert Zhao Liang’s QCon presentation on using large language models to automatically generate unit tests, covering pain points, goals, data‑quality engineering, model‑analysis fusion, architecture, evaluation metrics, and future plans for a production‑grade testing tool.

Volcano Engine Developer Services

Dec 26, 2024

How LLMs Can Auto-Generate Unit Tests: Insights from ByteDance’s QCon Talk

During the QCon Global Software Development Conference (Shanghai), ByteDance’s quality‑efficiency expert Zhao Liang presented a comprehensive solution for automatically generating unit tests using large language models (LLMs). The talk is organized into six parts: pain points, goals and challenges, data‑quality improvement, code‑generation improvement, effect demonstration, and summary with future planning.

Pain Points and Current Situation

Many developers struggle with low unit‑test coverage because writing tests consumes significant time, especially under tight release schedules. Teams often delegate quality assurance to testing groups that focus on functional verification rather than line‑by‑line code review. Consequently, large codebases have coverage below 10% and pose high production risk.

Writing unit tests is time‑consuming (5‑15 minutes per function depending on complexity).

Rapid business iterations leave developers with no bandwidth to create tests, leading to a growing backlog of untested code.

Existing generation tools (search‑based, genetic algorithms, early LLM attempts) suffer from unreadable output, unstable diversity, and low compile‑pass rates.

Goals and Challenges

The team defined four concrete goals: improve coverage, enhance assertion and mock effectiveness, increase repository‑level coverage, and achieve a favorable ROI for developers. Challenges include ensuring high‑quality training data, aligning prompts, and handling complex input/output relationships in business code.

Data Quality Improvement: Engineering Analysis Solves Data Problems

To boost data quality, three engineering steps were introduced:

Traffic Collection and Adoption : Gather real‑world traffic from manual tests, online usage, and interface automation, supplemented by fuzz testing and end‑to‑end replay with code instrumentation.

Traffic Distillation : Process raw traffic into type‑level strategies, perform path inference, and apply privacy‑preserving de‑identification.

Traffic Distribution : Feed distilled data into scenarios such as legacy test generation, IDE‑assisted generation, and merge‑request pipelines.

These steps dramatically increased the realism of generated test data, leading to higher developer acceptance and better defect discovery rates.

Equivalence‑Class Design and Path Coverage

The team applied equivalence‑class techniques to ensure generated tests cover complex code paths, handle accurate prompts, and avoid redundant or low‑value cases. Key considerations include complex input structures, dependency on external services, and exception‑path handling.

Model‑Program‑Analysis Fusion

Model generation is combined with static program analysis (AST, IR) to extract control‑flow and data‑flow information. The workflow consists of three stages: feature analysis of the target function, logical analysis of branches, and path analysis to enumerate all feasible execution paths (e.g., A‑C‑D, A‑C‑E). This information guides the LLM to produce targeted test cases, which are then iteratively refined for syntax and assertion correctness.

Overall Model Engineering Architecture

The architecture comprises three layers:

Data Layer : Sample construction from internal codebases and high‑starred GitHub projects, supplemented by GPT‑generated data for scarce scenarios; privacy filtering and format normalization.

Model Layer : Selection of a base LLM, enhanced with chain‑of‑thought prompting, preference alignment, and reinforcement learning.

Evaluation Layer : Human and automated metrics (compile‑pass, coverage, assertion success, runtime pass, path‑lift) feed back into preference scoring and multi‑round DPO fine‑tuning.

Two illustrative diagrams (preserved as

tags) show the data pipeline and the model‑engineering workflow.

Data Engineering Construction

Data engineering ensures trustworthy, diverse, and privacy‑compliant datasets across multiple programming languages. Six key steps are performed: sample labeling, quality filtering, privacy sanitization, format handling, data simplification (removing noisy logs, abstracting business‑specific identifiers), and data shuffling to improve model generalization.

Code Simplification and PE Engineering

Complex business semantics are abstracted (e.g., translating "China Construction Bank" to placeholders) while preserving function signatures, control flow, and return statements. This reduces training loss and accelerates convergence. PE engineering further splits the generation process into path enhancement, parameter completion, syntax correction, and assertion fixing, with experiments showing zero‑shot prompting sometimes outperforms few‑shot samples.

Evaluation and Results

Custom evaluation suites measure coverage and scenario support. Repository‑level coverage rose from ~40% to up to 60% after DPO‑based re‑training; per‑method coverage reached 83.09%. Assertion pass rates and compile‑pass rates also improved. The system now delivers a plug‑and‑play test‑generation product that requires no post‑generation fixes.

Summary and Future Planning

Completed work spans three layers: foundation (analysis, data construction, environment), generation (test framework, path lifting, data conversion), and correction (syntax, runtime, assertion fixes). Ongoing efforts focus on continuous model optimization, test‑case recall analysis, test‑case freshness mechanisms, and product diversification.

Speaker Introduction

Zhao Liang, with 13 years of experience at Ant Group and ByteDance, leads intelligent quality‑technology initiatives, holds four national patents, and specializes in program analysis and AI‑driven quality solutions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI LLM software engineering code analysis unit testing test generation

Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.