Boosting E‑Commerce AIGC with Knowledge Graphs: From Multimodal Inputs to Controlled LLMs
The article details how JD.com leverages domain‑specific and generic knowledge graphs to enhance multimodal product information, improve controlled text generation, and boost LLM performance for e‑commerce copywriting, covering model architecture, copy‑only mechanisms, token‑type encoding, experimental results, and practical deployment scenarios.
The presentation introduces JD.com’s exploration of AI‑generated content (AIGC) in e‑commerce, focusing on how knowledge graphs can be integrated to produce accurate and controllable product copy.
System Overview
Input data consists of heterogeneous multimodal sources: product images, textual descriptions, titles, and structured knowledge graph triples (converted to attribute‑value pairs). Initial processing extracts selling points and key elements from the knowledge graph.
Model Architecture
The generation model comprises separate text and image encoders feeding a decoder. The decoder employs restricted and copy‑based decoding to satisfy strict e‑commerce requirements, and incorporates pretrained language models and a sentence‑level fluency model to improve coherence between short sentences.
Controlled Generation Techniques
Input‑side control: filtering noisy or prohibited terms (e.g., “best”, “top‑level”) to avoid illegal advertising.
Vocabulary control: adjusting token probabilities to encourage desired words (selling points, attribute terms) and suppress forbidden ones.
Model‑side control: modifying encoder/decoder initializations, adding auxiliary tasks, and applying multi‑task learning.
A copy‑only mechanism is introduced for attribute values: during decoding, the generation probability for attribute tokens is set to zero, forcing the model to copy the exact value from the knowledge graph and eliminating errors such as mis‑stating capacity.
Knowledge Graph Integration
Domain‑specific graphs provide precise attribute‑value pairs, while generic graphs enrich descriptions with commonsense information. Token‑type embeddings indicate the source of each token (product description, domain graph, or generic graph), allowing the model to trust domain data fully and treat generic data selectively.
Visual gates are employed: a local visual gate focuses on image regions relevant to a specific attribute (e.g., collar style), and a global visual gate enhances overall text understanding, aiding knowledge‑graph completion for sparse product entries.
Experiments and Results
Evaluation on e‑commerce copy generation shows that adding an Only‑Copy mechanism raises fidelity from 64% to over 93% in human assessments. ROUGE scores improve when generic knowledge is incorporated, and token‑type information further boosts product‑specific description accuracy (e.g., correctly identifying headphones vs. phones).
Comparisons among three models—C‑PLUG (baseline), E‑PLUG (generic knowledge), and K‑PLUG (domain knowledge)—demonstrate significant gains on downstream tasks such as knowledge‑graph completion, multi‑turn dialogue, and product summarization. Human reviews indicate higher readability, richer content length (80+ characters vs. 60 for T5), and increased audit pass rates (90% vs. 76%).
Challenges and Future Directions
Key challenges include sparse knowledge graphs for millions of products, ensuring fidelity of generic knowledge, and scaling model training given limited hardware advancements. Proposed solutions involve multimodal fusion, token‑type conditioning, and adding noise to both encoder and decoder during pre‑training to better capture entity knowledge.
Conclusion
The work demonstrates that integrating both domain‑specific and generic knowledge graphs, combined with controlled decoding strategies, substantially improves the quality, accuracy, and trustworthiness of e‑commerce AIGC, paving the way for more reliable large‑model applications in industry.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
