Building High‑Quality Code Fine‑Tuning Datasets with UnitEval: An Open‑Source Toolkit

UnitEval is an open‑source toolbox that unifies prompts, provides a code‑quality pipeline, and offers extensible quality thresholds to automatically generate high‑quality code datasets for AI fine‑tuning, with detailed design principles, workflow steps, and usage instructions.

phodal
phodal
phodal
Building High‑Quality Code Fine‑Tuning Datasets with UnitEval: An Open‑Source Toolkit

Overview

UnitEval is an open‑source toolbox for building high‑quality code fine‑tuning datasets. It enforces a unified prompt format, a static code‑quality pipeline, and configurable quality thresholds.

Design Principle 1 – Unified Prompt

The same prompt template is used by the picker, the fine‑tuning data generator, and the evaluation runtime. A simplified template looks like:

Complete ${context.language} code, return rest code, no explaining
${context.framework}
``` ${context.language}
${context.relatedCode}
```
Code:
``` ${context.language}
${beforeCursor}
```

Design Principle 2 – Code‑Quality Pipeline

Before a source file is added to the dataset, UnitEval runs static analysis via the ArchGuard platform. Checks include code complexity, various bad‑smell categories (code, test), and architectural rules such as controller API design and repository SQL design. The pipeline can be extended with additional validators (e.g., OpenAPI validation, software composition analysis).

Design Principle 3 – Extensible Quality Thresholds

Quality checks are packaged as a Maven artifact. The built‑in CodeQualityType enum defines the available rule groups:

enum class CodeQualityType {
    BadSmell,
    TestBadSmell,
    JavaController,
    JavaRepository,
    JavaService,
}

Threshold values are supplied through a data class, for example:

data class BsThresholds(
    val bsLongParasLength: Int = 5,
    val bsIfSwitchLength: Int = 8,
    val bsLargeLength: Int = 20,
    val bsMethodLength: Int = 30,
    val bsIfLinesLength: Int = 3,
)

Custom rule sets can be added programmatically:

val ruleset = RuleSet(
    RuleType.SQL_SMELL,
    "normal",
    UnknownColumnSizeRule(),
    LimitTableNameLengthRule()
    // more rules …
)

Workflow

Picker Phase

Read a YAML configuration file to discover project repositories.

Clone each repository with git clone.

Select a language‑specific worker (currently Java and TypeScript are supported).

Run the language‑specific code‑quality checks defined in the pipeline.

Combine the analysis results with the unified prompt template to create a dataset entry.

Emit the generated fine‑tuning dataset.

Eval Phase

Read evaluation configuration (LLM model, prompt template, etc.).

Execute the PromptScript using the Chocolate Factory runtime.

Validate the model output with the factory’s ValidateRule implementations.

Getting Started

Clone the repository: https://github.com/unit-mesh/unit-eval

Add Maven dependencies:

dependencies {
    implementation("cc.unitmesh:unit-picker:0.1.5")
    implementation("cc.unitmesh:code-quality:0.1.5")
}

Download the released JAR file and run it directly, or use the Maven artifacts in your own project.

Illustrations

High‑quality fine‑tuning dataset illustration
High‑quality fine‑tuning dataset illustration
Tool‑fine‑tune‑evaluate integration diagram
Tool‑fine‑tune‑evaluate integration diagram
Code‑quality pipeline overview
Code‑quality pipeline overview
AIopen-sourcecode qualityPipelinetoolkitdataset generation
phodal
Written by

phodal

A prolific open-source contributor who constantly starts new projects. Passionate about sharing software development insights to help developers improve their KPIs. Currently active in IDEs, graphics engines, and compiler technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.