How AI Can Auto‑Generate Perfect Git Commit Messages
This article explains how a large‑language‑model‑driven tool can automatically create standardized Git commit messages by extracting change summaries, applying customizable plugins, measuring performance with MSE and adoption rate, and optimizing prompts, data pipelines, and fine‑tuning strategies.
Background
Large language models (LLMs) are increasingly used to automate tasks across the software development lifecycle. Writing a git commit message that conforms to team‑specific conventions remains tedious and error‑prone.
Problem
Developers must manually compose commit messages that include a category, product version, requirement ID, and a concise change summary. In Baidu APP the spec defines categories such as Feature, Update, Optimization, Test, Merge, FixBug, etc., making manual entry cumbersome.
Solution Overview
An AI‑powered CommitMessage assistant generates compliant messages by parsing the spec (customizable via plugins) and using an LLM to produce a change‑summary paragraph. The final message combines the spec header with the AI‑generated summary.
Entry Points
git aicommit– a Git alias that invokes the AI service. mgit aicommit – an MGit plugin (open‑source at https://github.com/baidu/m-git) that extends multi‑repo workflows.
Architecture
Both entry points call a shared Ruby module git‑aicommit, which extracts staged diffs, sends them to a model service, and assembles the final message.
# Add Git alias for AI commit
git config --global alias.aicommit '!f() { ruby -e "require \"git-aicommit\"; MGit::GitAICommit.run(ARGV);" "$@"; }; f'Evaluation Metrics
Model performance: Mean Squared Error (MSE) between generated and reference commit messages, measuring semantic similarity.
User adoption: Acceptance Rate (AR) – the proportion of generated messages that users accept without modification.
The chosen model is Baidu Qianfan’s ERNIE‑4, balancing cost, generation quality, and safety.
Prompt Engineering
Stop token: Model output ends with a custom %STOP% marker to prevent unnecessary token generation.
JSON‑only output: Prompt explicitly requests a Markdown‑wrapped JSON object.
Few‑shot examples: Provide example JSON to constrain format.
SFT (Supervised Fine‑Tuning): Fine‑tune a lightweight ERNIE‑Speed model on low‑quality cases, optionally routed through a MoE classifier that selects between ERNIE‑4 and the fine‑tuned ERNIE‑Speed.
{
"summary": "string", // < 30 Chinese characters
"reason": "string", // detailed bug cause
"fixup": "string" // concise fix description
}Data Processing Pipeline
Define schema (diff, reference message, category, auxiliary metadata).
Collect data from model‑generated messages, existing RD commits, and public datasets.
Clean (deduplication, noise removal).
Annotate reference quality and auxiliary fields.
Split into evaluation, training, validation, and test sets.
For small‑scale work use Pandas (https://pandas.pydata.org/); larger volumes may require Spark (https://spark.apache.org/).
Performance Optimization Strategies
Use a stop token to cut off unnecessary generation.
Refine prompts to reduce MSE and improve speed.
Apply SFT on low‑quality cases, optionally via a MoE routing classifier.
Custom Commit Specification via Plugins
An abstract Python plugin interface allows teams to define custom commit formats.
from abc import ABC, abstractmethod
class IPluginHook(ABC):
"""Plugin interface definition"""
@abstractmethod
def hook_prepare(self, ctx):
"""Prepare context"""
@abstractmethod
def hook_is_fix_bug(self, ctx) -> bool:
"""Determine if the commit is a FixBug type"""
@abstractmethod
def hook_language(self, ctx) -> str:
"""Target language, default Chinese"""
@abstractmethod
def hook_generate_variables(self, ctx):
"""Generate template variables"""
@abstractmethod
def hook_generate_message(self, ctx) -> str:
"""Render the final CommitMessage"""Plugins can be installed and loaded at runtime using pkg_resources and importlib:
def __install_plugin(pkg_name: str, version: str):
"""Install a plugin via pip"""
subprocess.check_call([sys.executable, '-m', 'pip', 'install', f"{pkg_name}=={version}"])
return __load_module(pkg_name, force=True)
def __load_module(pkg_name: str, force: bool = False):
"""Load a module, optionally reloading"""
module_name = __module_name(pkg_name)
loaded = sys.modules.get(module_name)
if loaded and force:
return importlib.reload(loaded)
if loaded:
return loaded
m = importlib.import_module(module_name)
return mFuture Outlook
LLMs excel at code understanding but still struggle with proprietary terminology, fixed formats, and long diff contexts limited by token budgets. Combining Retrieval‑Augmented Generation (RAG) with richer datasets and improving interactive entry points and custom specs will be essential for achieving true AI‑native development.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
