Four AI‑Driven Code Generation Techniques: From Example‑Based to Metadata‑Assisted
This article explores four distinct fine‑tuned LLaMA/ChatGLM approaches for AI‑assisted code generation—example‑based, test‑driven, metadata‑augmented, and information‑matching—detailing their training data, prompts, sample inputs and outputs, and evaluating their strengths, limitations, and suitable application scenarios.
Fine‑tuning approaches for code generation
Code example generation : Train the model on pairs of natural‑language descriptions and target code so it learns the underlying coding patterns and can synthesize new snippets.
Test‑driven generation : Use existing unit tests as prompts; the model generates business‑logic code that satisfies the supplied tests.
Metadata‑assisted generation : Provide additional context such as variable types, method signatures, or class definitions to guide the model toward more accurate completions.
Information‑matching generation : Incorporate database schema or table metadata into the prompt, enabling the model to produce SQL statements or repository methods that align with the underlying data model.
Code example generation (text‑to‑SQL)
Training data consists of a natural‑language query and the corresponding SQL. The model is invoked via an evaluate helper with the following signature:
evaluate(
task: str,
prompt: str,
temperature: float,
top_p: float,
max_new_tokens: int,
num_return_sequences: int,
max_seq_len: int
)Example call:
evaluate(
"text to sql",
"谁是最美丽的人",
0.1, 0.75, 40, 4, 512
)Generated output (illustrative):
SELECT MOST BEAUTIFUL FROM table WHERE BEAUTIFUL = 最美丽的人Limitation: without table schema the model cannot produce precise column names; adding schema metadata during fine‑tuning can mitigate this.
Test‑driven generation (test case generation)
Prompt the model with a Java method that lists files, then ask it to produce a matching JUnit test. Example source code:
public Set<String> listFilesUsingJavaIO(String dir) {
return Stream.of(new File(dir))
.listFiles()
.filter(file -> !file.isDirectory())
.map(File::getName)
.collect(Collectors.toSet());
}Corresponding generated test:
@Test
public void shouldListFilesUsingJavaIO() {
Set<String> files = ListFilesUsingJavaIO.listFilesUsingJavaIO("dir");
assertThat(files, containsInAnyOrder("file1", "file2", "file3"));
}Result quality depends on the size and cleanliness of the test‑code dataset; a small dataset and weight‑loading issues with LLaMA occasionally produce defects.
Metadata‑assisted generation (code completion)
Provide a concise class description and method signatures, then ask the model to implement a specific method. Example prompt:
evaluate(
"Implement the method updatePostCommentsSize",
"PostRepository()
- methods: findById(Long): Optional<Post>, updatePostCommentsSize(int, Long): void, findAllByPostTopics(String): List<Post>, findDistinctByPostTopics(String): List<Post>",
0.1, 0.75, 40, 4, 512
)Generated implementation:
public class PostRepository {
public void updatePostCommentsSize(int size, Long postId) {
Post post = findById(postId);
if (post == null) {
return;
}
post.setPostCommentsSize(size);
save(post);
}
}The snippet is functionally correct for a CRUD operation, but integration with surrounding services (e.g., transaction handling) is still required for production use.
Information‑matching generation (text‑to‑repository)
Combine a natural‑language request with explicit table definitions to generate a Kotlin @Query method. Example input:
instruction: 我想查找特定月份(monthly_id)下在某个日期(date)之前的所有费用(expense)
input: data class ExpenseEntity(...)
output: @Query("SELECT * FROM expense WHERE monthly_id = :recurringExpenseId AND date < :beforeDate")
suspend fun getAllExpensesForRecurringExpenseBeforeDate(recurringExpenseId: Long, beforeDate: LocalDate): List<ExpenseEntity>Another example converting a user‑age query into a repository method:
evaluate(
"text to kotlin repository with class",
"我想查询指定年龄的用户(User)的博客数量。
###data class User(var age: Int, val blogId: Int) data class Post(val title: String)###",
0.1, 0.75, 40, 4, 512
) @Query("SELECT COUNT(*) FROM User WHERE age = :age")
abstract fun getBlogCount(age: Int): LongGenerated code is generally correct, but the model tends to fall back to generic SELECT * statements, which reduces usefulness.
Comparison of the four methods
Code example generation : High randomness, no extra context required; best for quick drafts of generic code.
Test‑driven generation : Low randomness, relies on high‑quality test inputs; yields stable business code aligned with test specifications.
Metadata‑assisted generation : Moderate randomness, benefits from detailed type and signature information; suitable for complex logic and CRUD‑style code.
Information‑matching generation : Low randomness, leverages database schema to produce precise SQL or repository methods; ideal for data‑centric queries.
Key takeaways
Providing richer contextual information—whether example code, test cases, type metadata, or schema definitions—consistently improves the accuracy and relevance of generated code. Each approach has trade‑offs between required input preparation and output determinism, allowing practitioners to choose the method that best fits their workflow.
Repository with the experimental data and scripts: https://github.com/unit-mesh/unit-minions
phodal
A prolific open-source contributor who constantly starts new projects. Passionate about sharing software development insights to help developers improve their KPIs. Currently active in IDEs, graphics engines, and compiler technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
