How TransE+ Boosts Knowledge Graph Embedding on WeChat’s Plato Framework
This article presents the development and deployment of the TransE+ knowledge‑graph embedding model on the Plato graph‑computing platform, detailing its architectural upgrades, training optimizations, performance gains, and business‑oriented adaptations for large‑scale real‑world applications.
Knowledge Graphs (KG) have become a research hotspot for storing factual information, and Knowledge Graph Embedding (KGE) has emerged as an effective auxiliary in NLP and recommendation scenarios. To enhance KG data capabilities, WeChat’s graph‑computing framework Plato and the WeKB knowledge graph jointly developed the TransE+ model, achieving over 3% improvement on link prediction compared to the strongest baselines.
1 Introduction
Knowledge Graphs represent numerous real‑world facts as graphs. With the rise of deep learning and the "everything can be embedded" paradigm, KGE methods, especially the Trans* family, have flourished and significantly improve downstream NLP and recommendation tasks by injecting prior knowledge.
Figure 1. Knowledge Graph
KGE effectiveness is well‑established in academia. Leveraging KGE theory, we aim to explore large‑scale KG data in WeChat scenarios, providing generalized entity information for long‑term business support. This paper introduces the practical exploration of KG representation learning by the Plato‑WeKB team, summarizing common academic models, extending graph engine support, and presenting the upgraded TransE+ model that achieves state‑of‑the‑art results on link prediction.
2 Existing Academic Work
KG objects consist of entities (nodes) and relations (edges). For directed graphs, entities can be heads (source) or tails (target). Triples encode factual knowledge, and KGE seeks compact embeddings for both entities and relations. The goal is to find an ideal distance metric to optimize these embeddings.
TransE [1] introduced the first translation‑based KGE model, defining the distance between head + relation and tail embeddings and using a pairwise hinge loss with negative sampling. However, TransE struggles with one‑to‑many, many‑to‑one, many‑to‑many, and symmetric relations, as illustrated in Figures 2 and 3.
Figure 2. TransE principle
Figure 3. TransE limitation
Subsequent models such as TransH, TransR, and others (see Table 1) address these limitations by projecting entities into relation‑specific spaces or designing alternative scoring functions.
Figure 4. Comparison of Trans* models
3 TransE+ on Plato
3.1 Architecture: Upgrading the Graph Engine for KG
Figure 5. PlatoDeep architecture
PlatoDeep is WeChat’s large‑scale graph‑computing framework that integrates TensorFlow, abstracts GNN and traditional graph operations, and provides APIs for sampling points/neighbors, loading attributes, and random walks. Compared with GNN training, KGE training differs mainly in sampling triples instead of edges and handling multi‑type relations.
3.2 Model
3.2.1 Classic TransE: Shortcomings?
Classic TransE follows a pairwise training pipeline: positive triples are sampled, heads or tails are randomly replaced to generate equal‑amount negative triples, and a hinge loss is applied.
Figure 6. TransE training flow
3.2.2 Upgraded TransE: State‑of‑the‑Art
We adopt a point‑wise learning‑to‑rank approach, treating each sample as an independent binary classification problem with cross‑entropy loss. This allows richer negative sampling and enables curriculum learning (CL) to focus on hard samples. Embedding vectors are initialized with a margin‑based uniform distribution, and self‑adversarial negative sampling assigns differentiated weights to negatives.
Table 4. Plato model performance comparison (FB15K)
Our ablation study (Table 5) quantifies the contribution of each strategy: replacing hinge loss with point‑wise loss, margin‑based initialization, and self‑adversarial sampling all yield noticeable gains.
Table 5. Ablation results
3.2.3 Business‑Oriented Model Pruning and Generalization
KGE is widely used in knowledge‑base question answering (KBQA) and dialogue generation. However, in practical scenarios such as video‑tag recommendation, relation information may be unnecessary, and similarity search based on entity embeddings is sufficient. We therefore prune relation‑related components and adopt a dual‑tower architecture that focuses on entity similarity, incorporating entity descriptions via a single‑layer mapping and aligning structural and semantic embeddings with symmetric KL divergence.
Figure 7. KBQA workflow
Experiments on the WeKB dataset show that the business‑aligned similarity search improves MR from 40017 to 10399 while maintaining strong link‑prediction performance.
Table 6. WeKB evaluation results
3.3 Performance: Scaling Concurrency
Training large‑scale KGE models demands efficient graph sampling and distributed computation. We observe that sampling is not a bottleneck; the main focus is optimizing TensorFlow’s distributed training. By increasing operator parallelism and worker CPU utilization, we achieve 130–150 global steps per second on a dataset with 4.21 M nodes and 18.15 M triples.
Figure 9. TensorFlow distributed training
Additional optimizations such as unique‑gather before embedding lookup and replacing inefficient operators further boost training speed (see Table 7).
Table 7. Training performance improvements
4 Summary and Outlook
We presented the design and engineering of TransE+ on Plato, demonstrating its effectiveness for WeChat’s knowledge graph. Future work includes exploring better negative‑sampling strategies and end‑to‑end business model innovations.
Research more effective negative‑sampling methods such as local closed‑world assumption.
Innovate end‑to‑end business models that incorporate additional context and constraints.
References
[1] Bordes, Antoine, et al. "Translating embeddings for modeling multi‑relational data." Advances in Neural Information Processing Systems , 26 (2013).
[2] Wang, Zhen, et al. "Knowledge graph embedding by translating on hyperplanes." Proceedings of the AAAI Conference on Artificial Intelligence , vol. 28, no. 1 (2014).
[3] Lin, Yankai, et al. "Learning entity and relation embeddings for knowledge graph completion." AAAI Conference on Artificial Intelligence , 2015.
[4] https://github.com/thunlp/OpenKE/tree/OpenKE‑Tensorflow1.0
[5] Nickel, Maximilian, Lorenzo Rosasco, and Tomaso Poggio. "Holographic embeddings of knowledge graphs." Proceedings of the AAAI Conference on Artificial Intelligence , vol. 30, no. 1 (2016).
[6] https://graphvite.io/docs/latest/benchmark.html#knowledge-graph-embedding
[7] Wang, Xiaozhi, et al. "KEPLER: A unified model for knowledge embedding and pre‑trained language representation." Transactions of the Association for Computational Linguistics , 9 (2021): 176‑194.
[8] Yu, Mo, et al. "Improved Neural Relation Detection for Knowledge Base Question Answering." Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Long Papers) (2017).
[9] Huang, Xiao, et al. "Knowledge graph embedding based question answering." Proceedings of the 12th ACM International Conference on Web Search and Data Mining (2019).
[10] Luo, Kangqi, et al. "Knowledge base question answering via encoding of complex query graphs." Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018).
[11] Zhang, Zhengyan, et al. "ERNIE: Enhanced Language Representation with Informative Entities." Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019).
[12] Abadi, Martín, et al. "TensorFlow: Large‑Scale Machine Learning on Heterogeneous Distributed Systems." (2015).
[13] Dong, Xin, et al. "Knowledge vault: A web‑scale approach to probabilistic knowledge fusion." Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014).
[14] Ji, Shaoxiong, et al. "A survey on knowledge graphs: Representation, acquisition, and applications." IEEE Transactions on Neural Networks and Learning Systems , 33.2 (2021): 494‑514.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
WeChat Backend Team
Official account of the WeChat backend development team, sharing their experience in large-scale distributed system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
