How Alibaba’s “Cangjingge” Knowledge Engine Powers AI with Massive Graphs
Alibaba, together with top Chinese universities and research institutes, unveiled the Cangjingge Knowledge Engine project, detailing its massive data assets, five‑module architecture, large‑scale knowledge construction techniques, and initial deployments in safety and tourism knowledge graphs to boost AI applications.
In April 2018, Alibaba partnered with Tsinghua University, Zhejiang University, the Institute of Automation of the Chinese Academy of Sciences, the Institute of Software of the Chinese Academy of Sciences, and Soochow University to launch the Cangjingge (Knowledge Engine) research program, aiming to build an open knowledge‑engine service platform within a year.
Massive Knowledge Behind AI Applications
Over the past 19 years, Alibaba’s ecosystem has generated massive data from consumers, sellers, brands, and operators. For product‑related data, Alibaba maintains billions of entities (brands, products, barcodes) and billions of relational edges; for encyclopedia‑style data, it holds tens of millions of entities (people, places, companies) with billions of edges.
Data sources span national standards (e.g., GS1 barcodes), Alibaba’s e‑commerce platforms (Taobao, Tmall, Hema), and other services (Gaode, UC). To serve AI applications, this data must be highly complete, consistent, and de‑duplicated, forming a unified knowledge base that supports business‑mid‑platform intelligence, search & recommendation, and intelligent interaction.
Beyond factual knowledge, Alibaba also curates formalized knowledge essential for vertical knowledge graphs, such as product tagging, classification, relationship generation, and cross‑market product publishing.
Key Challenges
Rapidly building knowledge graphs for numerous vertical domains.
Connecting knowledge graphs across different domains efficiently.
Managing and continuously updating massive factual and formalized knowledge.
Providing a unified knowledge representation for search, recommendation, intelligent interaction, and business‑intelligence upgrades.
Achieving brain‑like reasoning by integrating perception and cognition.
First Disclosure of Large‑Scale Knowledge Construction Techniques
The plan relies on Alibaba’s powerful computing (e.g., Igraph graph database) and advanced machine‑learning platforms (e.g., PAI) and divides the knowledge engine into five modules: knowledge modeling, knowledge acquisition, knowledge fusion, knowledge reasoning & computation, and knowledge empowerment.
These modules deliver end‑to‑end services from raw data to knowledge services, with plug‑in vertical knowledge graphs that can be loaded on demand.
Scalable Knowledge Fusion & Acquisition
To support diverse domains, the fusion and acquisition algorithms must scale, leveraging crowdsourcing for rapid training data collection. Adversarial learning is applied to mitigate noisy annotations, allowing the model to learn common annotator features and improve recognition accuracy.
Improving Entity‑Relation Extraction
Syntactic information is crucial for extracting entity relations. A tree‑structured representation embeds hierarchical syntax into deep‑learning networks, enabling more accurate identification of relationships such as company founders.
Logical Reasoning Combined with Deep Learning
The reasoning engine uses first‑order logic Horn clauses to represent millions of formalized facts, enabling deterministic reasoning and deep‑learning‑based relation completion. The engine supports plug‑in algorithms, vocabularies, and vertical graphs, allowing, for example, food origin queries that combine lexical, geographic, and recommendation knowledge.
First Application Deployments of the Cangjingge Plan
The plan has already been applied to safety and tourism knowledge graphs. In safety, the engine powers a city‑brain service that offers full‑element search, enhancing urban safety. In tourism, it structures travel guides and improves destination information, greatly enriching user experience.
Future goals include delivering a suite of services within a year: semi‑automated ontology construction tools, text structuring algorithms, multi‑source knowledge fusion tools, formal‑knowledge reasoning tools, and natural‑language, logical‑language, and database‑language query services, ultimately building a universal knowledge engine for all vertical domains.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
