How AI Can Auto‑Generate Standardized Git Commit Messages

This article details the design, implementation, and evaluation of an AI‑powered tool that automatically creates compliant Git commit messages by leveraging large language models, custom plugins, and performance‑focused optimizations to improve developer productivity and commit quality.

Baidu App Technology
Baidu App Technology
Baidu App Technology
How AI Can Auto‑Generate Standardized Git Commit Messages

Background

With the rapid growth of large language models (LLMs), many AI‑driven tools have emerged across the software development lifecycle, such as RAG assistants for on‑call support, Copilot‑style coding aids, and automated agents for delivery. One persistent pain point is the cumbersome process of writing commit messages that conform to strict guidelines.

Function and Design

The proposed git aicommit command (and its mgit aicommit counterpart for Baidu's multi‑repo MGit tool) acts as an extension that extracts the staged diff, sends it to an LLM service, and returns a formatted commit message. The core logic resides in a shared Ruby module git‑aicommit, which can be invoked via a Git alias or an MGit plugin.

CommitMessage composition diagram
CommitMessage composition diagram

Evaluation Metrics

Two primary metrics are used to assess the system:

Model performance: Mean Squared Error (MSE) between generated and reference commit messages, measuring semantic similarity.

User adoption: Acceptance Rate (AR), the proportion of generated messages that users adopt without modification.

The MSE is computed by embedding both texts, calculating cosine distance, squaring the difference, and averaging across dimensions:

# Pseudocode for MSE calculation
emb_ref = embed(reference_message)
emb_gen = embed(generated_message)
cosine = cosine_similarity(emb_ref, emb_gen)
mse = mean((1 - cosine) ** 2)

Data Processing

Effective data handling is crucial. The pipeline defines several dataset types (training, validation, test, anomaly) and sources (LLM‑generated messages, existing RD commits, open‑source datasets). After collection, data are cleaned, deduplicated, annotated, and split for model selection, prompt tuning, and SFT.

Tools such as Pandas are used for small‑scale processing, while Spark is recommended for larger volumes.

Performance Optimization

Three techniques are applied to improve both quality and speed:

Stop token: Adding a custom %STOP% marker to the LLM output limits unnecessary token generation.

Prompt engineering: Providing clear instructions and few‑shot examples to enforce JSON output format.

SFT (Supervised Fine‑Tuning): Fine‑tuning a lightweight ERNIE‑Speed model on low‑quality cases, optionally routed through a MoE classifier that selects between ERNIE‑4 and the fine‑tuned model.

Custom Commit Specification

Because different products have distinct commit formats, the system defines an abstract Python plugin interface ( IPluginHook) that can be implemented to generate the final message according to any schema. Plugins are dynamically loaded via importlib and can be installed with pip.

from abc import ABC, abstractmethod

class IPluginHook(ABC):
    @abstractmethod
    def hook_prepare(self, ctx):
        """Prepare context"""
    @abstractmethod
    def hook_is_fix_bug(self, ctx) -> bool:
        """Detect FixBug type"""
    @abstractmethod
    def hook_language(self, ctx) -> str:
        """Select language (default Chinese)"""
    @abstractmethod
    def hook_generate_variables(self, ctx):
        """Generate template variables"""
    @abstractmethod
    def hook_generate_message(self, ctx) -> str:
        """Compose final CommitMessage"""

Future Directions

While LLMs excel at code understanding, they still struggle with proprietary terminology and long diff contexts limited by token windows. Enhancing datasets, integrating Retrieval‑Augmented Generation (RAG), and improving multi‑repo diff handling are identified as next steps toward a truly AI‑native commit workflow.

Future architecture diagram
Future architecture diagram
Performance optimizationAILLMprompt engineeringGitSFTcommit message
Baidu App Technology
Written by

Baidu App Technology

Official Baidu App Tech Account

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.