Boosting E‑Commerce AIGC with Knowledge Graphs: From Multimodal Inputs to Controlled LLMs

The article details how JD.com leverages domain‑specific and generic knowledge graphs to enhance multimodal product information, improve controlled text generation, and boost LLM performance for e‑commerce copywriting, covering model architecture, copy‑only mechanisms, token‑type encoding, experimental results, and practical deployment scenarios.

NewBeeNLP
NewBeeNLP
NewBeeNLP
Boosting E‑Commerce AIGC with Knowledge Graphs: From Multimodal Inputs to Controlled LLMs

The presentation introduces JD.com’s exploration of AI‑generated content (AIGC) in e‑commerce, focusing on how knowledge graphs can be integrated to produce accurate and controllable product copy.

System Overview

Input data consists of heterogeneous multimodal sources: product images, textual descriptions, titles, and structured knowledge graph triples (converted to attribute‑value pairs). Initial processing extracts selling points and key elements from the knowledge graph.

Model Architecture

The generation model comprises separate text and image encoders feeding a decoder. The decoder employs restricted and copy‑based decoding to satisfy strict e‑commerce requirements, and incorporates pretrained language models and a sentence‑level fluency model to improve coherence between short sentences.

Controlled Generation Techniques

Input‑side control: filtering noisy or prohibited terms (e.g., “best”, “top‑level”) to avoid illegal advertising.

Vocabulary control: adjusting token probabilities to encourage desired words (selling points, attribute terms) and suppress forbidden ones.

Model‑side control: modifying encoder/decoder initializations, adding auxiliary tasks, and applying multi‑task learning.

A copy‑only mechanism is introduced for attribute values: during decoding, the generation probability for attribute tokens is set to zero, forcing the model to copy the exact value from the knowledge graph and eliminating errors such as mis‑stating capacity.

Knowledge Graph Integration

Domain‑specific graphs provide precise attribute‑value pairs, while generic graphs enrich descriptions with commonsense information. Token‑type embeddings indicate the source of each token (product description, domain graph, or generic graph), allowing the model to trust domain data fully and treat generic data selectively.

Visual gates are employed: a local visual gate focuses on image regions relevant to a specific attribute (e.g., collar style), and a global visual gate enhances overall text understanding, aiding knowledge‑graph completion for sparse product entries.

Experiments and Results

Evaluation on e‑commerce copy generation shows that adding an Only‑Copy mechanism raises fidelity from 64% to over 93% in human assessments. ROUGE scores improve when generic knowledge is incorporated, and token‑type information further boosts product‑specific description accuracy (e.g., correctly identifying headphones vs. phones).

Comparisons among three models—C‑PLUG (baseline), E‑PLUG (generic knowledge), and K‑PLUG (domain knowledge)—demonstrate significant gains on downstream tasks such as knowledge‑graph completion, multi‑turn dialogue, and product summarization. Human reviews indicate higher readability, richer content length (80+ characters vs. 60 for T5), and increased audit pass rates (90% vs. 76%).

Challenges and Future Directions

Key challenges include sparse knowledge graphs for millions of products, ensuring fidelity of generic knowledge, and scaling model training given limited hardware advancements. Proposed solutions involve multimodal fusion, token‑type conditioning, and adding noise to both encoder and decoder during pre‑training to better capture entity knowledge.

Conclusion

The work demonstrates that integrating both domain‑specific and generic knowledge graphs, combined with controlled decoding strategies, substantially improves the quality, accuracy, and trustworthiness of e‑commerce AIGC, paving the way for more reliable large‑model applications in industry.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMmultimodalAIGCKnowledge Graphcontrolled text generation
NewBeeNLP
Written by

NewBeeNLP

Always insightful, always fun

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.