EcomGPT: Training an E-commerce Domain Large Language Model via Instruction Tuning
EcomGPT, an Alibaba‑trained e‑commerce large language model, uses a 1.5 million‑sample instruction dataset (EcomInstruct) to demonstrate that domain‑specific instruction tuning dramatically outperforms general‑purpose models on e‑commerce tasks, reducing hallucinations and improving task accuracy, with performance scaling as data diversity increases.
This paper presents EcomGPT, an e-commerce domain large language model developed by Alibaba NLP team. The research addresses the fundamental question of why domain-specific LLMs are necessary when general-purpose models have been trained on massive datasets.
Why Domain Models Matter: The authors argue that while general models contain vast amounts of publicly available data, they lack proprietary domain knowledge specific to particular industries. Their experiments demonstrate that general models (including ChatGPT) perform significantly worse on domain-specific tasks like e-commerce matching recommendations compared to general tasks, with more severe knowledge hallucination issues. This performance gap is more pronounced in smaller models.
Data Construction: The team constructed EcomInstruct, a large-scale e-commerce instruction dataset comprising 122 training tasks/datasets (held-in) with approximately 1.5 million data samples, and 12 evaluation tasks (held-out). The dataset includes two components: (1) 65 public e-commerce task datasets from academic papers and competition platforms, covering named entity recognition, review QA, product category prediction, and multi-turn dialogue; (2) Atom tasks constructed around fundamental data types (product information, user conversations, reviews, search queries) including entity span recognition and entity classification. These atom tasks represent basic semantic understanding capabilities applied during intermediate stages of task resolution.
Training Methodology: The instruction tuning approach combines task descriptions, task instructions, and input sentences. Through comparative experiments, the authors found that providing task descriptions, using a single language consistently, and diversifying task instructions all improve model generalization.
Experimental Results: Evaluation on 12 held-out datasets using Rouge-L and F1 metrics shows that domain-specific instruction fine-tuning significantly outperforms general models. The fine-tuned model demonstrates better understanding of e-commerce tasks and produces more domain-compliant responses. Analysis reveals strong correlation between Rouge scores and winning rates in human evaluations. Scaling experiments indicate that greater data diversity leads to better generalization, suggesting further improvements with additional domain task collection.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.