How AI Can Auto‑Generate Perfect Git Commit Messages

This article explains how a large‑language‑model‑driven tool can automatically create standardized Git commit messages by extracting change summaries, applying customizable plugins, measuring performance with MSE and adoption rate, and optimizing prompts, data pipelines, and fine‑tuning strategies.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
How AI Can Auto‑Generate Perfect Git Commit Messages

Background

Large language models (LLMs) are increasingly used to automate tasks across the software development lifecycle. Writing a git commit message that conforms to team‑specific conventions remains tedious and error‑prone.

Problem

Developers must manually compose commit messages that include a category, product version, requirement ID, and a concise change summary. In Baidu APP the spec defines categories such as Feature, Update, Optimization, Test, Merge, FixBug, etc., making manual entry cumbersome.

Solution Overview

An AI‑powered CommitMessage assistant generates compliant messages by parsing the spec (customizable via plugins) and using an LLM to produce a change‑summary paragraph. The final message combines the spec header with the AI‑generated summary.

Entry Points

git aicommit

– a Git alias that invokes the AI service. mgit aicommit – an MGit plugin (open‑source at https://github.com/baidu/m-git) that extends multi‑repo workflows.

Architecture

Both entry points call a shared Ruby module git‑aicommit, which extracts staged diffs, sends them to a model service, and assembles the final message.

# Add Git alias for AI commit
git config --global alias.aicommit '!f() { ruby -e "require \"git-aicommit\"; MGit::GitAICommit.run(ARGV);" "$@"; }; f'

Evaluation Metrics

Model performance: Mean Squared Error (MSE) between generated and reference commit messages, measuring semantic similarity.

User adoption: Acceptance Rate (AR) – the proportion of generated messages that users accept without modification.

The chosen model is Baidu Qianfan’s ERNIE‑4, balancing cost, generation quality, and safety.

Prompt Engineering

Stop token: Model output ends with a custom %STOP% marker to prevent unnecessary token generation.

JSON‑only output: Prompt explicitly requests a Markdown‑wrapped JSON object.

Few‑shot examples: Provide example JSON to constrain format.

SFT (Supervised Fine‑Tuning): Fine‑tune a lightweight ERNIE‑Speed model on low‑quality cases, optionally routed through a MoE classifier that selects between ERNIE‑4 and the fine‑tuned ERNIE‑Speed.

{
  "summary": "string", // < 30 Chinese characters
  "reason": "string",  // detailed bug cause
  "fixup": "string"   // concise fix description
}

Data Processing Pipeline

Define schema (diff, reference message, category, auxiliary metadata).

Collect data from model‑generated messages, existing RD commits, and public datasets.

Clean (deduplication, noise removal).

Annotate reference quality and auxiliary fields.

Split into evaluation, training, validation, and test sets.

For small‑scale work use Pandas (https://pandas.pydata.org/); larger volumes may require Spark (https://spark.apache.org/).

Performance Optimization Strategies

Use a stop token to cut off unnecessary generation.

Refine prompts to reduce MSE and improve speed.

Apply SFT on low‑quality cases, optionally via a MoE routing classifier.

Custom Commit Specification via Plugins

An abstract Python plugin interface allows teams to define custom commit formats.

from abc import ABC, abstractmethod

class IPluginHook(ABC):
    """Plugin interface definition"""
    @abstractmethod
    def hook_prepare(self, ctx):
        """Prepare context"""
    @abstractmethod
    def hook_is_fix_bug(self, ctx) -> bool:
        """Determine if the commit is a FixBug type"""
    @abstractmethod
    def hook_language(self, ctx) -> str:
        """Target language, default Chinese"""
    @abstractmethod
    def hook_generate_variables(self, ctx):
        """Generate template variables"""
    @abstractmethod
    def hook_generate_message(self, ctx) -> str:
        """Render the final CommitMessage"""

Plugins can be installed and loaded at runtime using pkg_resources and importlib:

def __install_plugin(pkg_name: str, version: str):
    """Install a plugin via pip"""
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', f"{pkg_name}=={version}"])
    return __load_module(pkg_name, force=True)

def __load_module(pkg_name: str, force: bool = False):
    """Load a module, optionally reloading"""
    module_name = __module_name(pkg_name)
    loaded = sys.modules.get(module_name)
    if loaded and force:
        return importlib.reload(loaded)
    if loaded:
        return loaded
    m = importlib.import_module(module_name)
    return m

Future Outlook

LLMs excel at code understanding but still struggle with proprietary terminology, fixed formats, and long diff contexts limited by token budgets. Combining Retrieval‑Augmented Generation (RAG) with richer datasets and improving interactive entry points and custom specs will be essential for achieving true AI‑native development.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AILLMGitPromptEngineeringDataProcessingCommitMessage
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.