BlackPearl Team Wins All Three Tracks of KDD 2024 OAG‑Challenge Cup with Large‑Model Solutions

The BlackPearl team from Meituan’s Dazhong Dianping division swept all three KDD 2024 OAG‑Challenge Cup tracks—WhoIsWho, PST, and AQA—by deploying innovative large‑model techniques such as iterative text clustering, graft‑learning‑enhanced BERT RAG pipelines, and a Boosting LLM‑for‑Vector search, and have released the code publicly on GitHub.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
BlackPearl Team Wins All Three Tracks of KDD 2024 OAG‑Challenge Cup with Large‑Model Solutions

ACM SIGKDD (Knowledge Discovery and Data Mining) is the premier international conference in data mining. The KDD Cup, organized by SIGKDD since 1997, is the most influential competition in the field.

Recently, the BlackPearl team from Meituan's Dazhong Dianping Technology Department / Search and Content Intelligence group participated in the KDD 2024 OAG‑Challenge Cup, tackling three tracks: WhoIsWho‑IND, PST, and AQA. The team won the champion in all three tracks with a large margin.

In the WhoIsWho (same‑name disambiguation) track, the team introduced an iterative large‑model text clustering method enhanced by self‑feedback. This approach can directly process structured information and output clustering results end‑to‑end, achieving an 83 % gAUC score that significantly surpasses traditional machine‑learning baselines.

In the PST (paper source tracing) track, the team applied graft learning to inject the powerful semantic matching capability of BERT‑like models into a large language model, boosting sample confidence. They also built a Retrieval‑Augmented Generation (RAG) based automatic feature‑engineering pipeline to mitigate noisy, multi‑modal textual data. Their 7B single model outperformed a ChatGPT + RAG solution by 10 % on the MAP metric.

For the AQA (academic paper question answering) track, the team addressed heavy noise in the data. By combining LLM‑for‑Vector and Boosting techniques, they created an integrated recall‑and‑ranking Boosting LLM for Searching solution, which outperformed traditional text‑embedding search methods and transferred the LLM’s semantic understanding to noisy academic search scenarios.

The BlackPearl team has open‑sourced all three solutions on GitHub ( https://github.com/BlackPearl-Lab/KddCup-2024-OAG-Challenge-1st-Solutions ) for the research community, and will present their findings at KDD 2024 in Barcelona. Future work will continue to explore large‑model technologies to enhance Meituan’s products and services.

machine learningdata miningLarge Language ModelKDD CupAcademic DisambiguationPaper Retrieval
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.