Artificial Intelligence 20 min read

OPPO's CHAOS Pretrained Large Model and GammaE Knowledge‑Graph Multi‑hop Reasoning: Techniques and Insights

This article presents OPPO Research Institute's recent advances in large‑model AI, detailing the CHAOS pretrained model that topped the CLUE leaderboard, the knowledge‑enhanced training pipeline, and the GammaE model for multi‑hop reasoning over knowledge graphs, together with experimental results and practical training tips.

DataFunSummit

Jun 28, 2023

OPPO's CHAOS Pretrained Large Model and GammaE Knowledge‑Graph Multi‑hop Reasoning: Techniques and Insights

The rapid evolution of large‑scale pretrained models has raised new challenges, such as integrating logical reasoning and structured knowledge, and improving knowledge‑graph effectiveness. OPPO researcher Yang Dong shares two key projects: the CHAOS pretrained model and the GammaE model for knowledge‑graph multi‑hop inference.

1. CHAOS Model Overview – CHAOS, a 30‑billion‑parameter encoder‑only model, achieved first place on three CLUE benchmarks, outperforming many larger models. The talk reviews the current landscape of pretrained models (BERT, GPT‑1/2/3, T5, PaLM, MT‑NLG) and highlights the trade‑off between model size, training steps, and downstream performance.

2. Training Data and Trends – Modern models consume trillions of tokens, often sourced from open‑domain corpora like Common Crawl, while structured knowledge data remains a small fraction. OPPO built a high‑quality 5.56‑billion‑entity knowledge graph to augment pretraining data.

3. Knowledge‑Enhanced Pretraining Solutions – The pipeline adopts knowledge‑enhanced data augmentation, inspired by Baidu's ERNIE 3.0, and selects RoBERTa/DeBERTa as the base encoder. Techniques include word‑embedding complementarity, transformer‑based encoder/decoder architectures, and efficient model selection (favoring encoder‑only for CLUE tasks).

4. CHAOS Model Details – DeBERTa resolves relative position‑encoding loss; training used 48 nodes × 16 V100 GPUs for 30 days, processing ~10 trillion Chinese tokens. Span‑Denoising with span length 2 was chosen to accommodate long entities while keeping sequence length manageable.

5. GammaE Model for Multi‑hop Reasoning – GammaE introduces a Gamma‑distribution‑based embedding with a three‑layer MLP for projection, using Gaussian mixture models for union and negation, and an elasticity factor to improve KL‑divergence stability. It outperforms traditional methods (TransE, Query2Box, ConE) on EPFO, negation, and composite query benchmarks.

6. Experimental Results – GammaE shows superior MRR, Spearman, and Pearson scores, better robustness on logical operations, and lower time complexity compared with prior models.

7. Q&A Highlights – The simplest way to inject knowledge graphs is to extend the vocabulary with high‑frequency entities; however, more sophisticated integration is needed for substantial gains. The team emphasizes building smaller, more generalizable models rather than ever‑larger ones.

Overall, the presentation underscores the importance of knowledge‑enhanced pretraining, efficient model design, and innovative reasoning mechanisms to advance AI capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Model Optimization Large Language Models AI research pretraining knowledge graph chaos GammaE

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.