Artificial Intelligence 14 min read

Multi-step Reasoning over Large-scale Knowledge Graphs: Query2Box and SMORE Framework

This talk presents recent advances in multi-step reasoning over large-scale, noisy knowledge graphs, introducing the Query2Box model that uses box embeddings for complex queries and the SMORE framework that enables efficient multi-hop inference on massive graphs through scalable query generation, embedding computation, and training pipelines.

DataFunSummit
DataFunSummit
DataFunSummit
Multi-step Reasoning over Large-scale Knowledge Graphs: Query2Box and SMORE Framework

The presentation begins with an overview of knowledge graphs, describing them as collections of triples (h, r, t) that encode factual information such as "Mona Lisa was created by Da Vinci". Large public graphs like Wikidata and Freebase contain millions of entities but also substantial noise, making edge prediction and graph completion essential tasks.

It then defines the multi-step reasoning problem: given a complex logical query, directly predict the answer entity without enumerating intermediate graph walks. Example queries include finding the university of a Turing Award winner, identifying the president of a European country that never hosted a World Cup, or predicting drugs that target proteins associated with COVID‑19.

Query2Box is introduced as a model that performs reasoning in an embedding space using box embeddings. Each entity and relation is represented by a vector, while a query is mapped to a high‑dimensional box defined by a learnable center and offset. Logical operators are implemented as geometric operations: projection translates a box via a relation embedding, and intersection computes the overlap of multiple boxes, ensuring the resulting box’s center lies in the convex hull of inputs and its size shrinks accordingly. This formulation enables efficient k‑nearest‑neighbor search for answers and scales linearly with the number of hops.

The talk then describes the SMORE framework, which extends embedding‑based multi‑hop reasoning to ultra‑large graphs (e.g., Freebase with ~90 M nodes). SMORE tackles two challenges: (1) constructing multi‑step query templates and generating training data without prohibitive exponential cost, and (2) handling the massive sparse embedding matrices alongside dense logical operators. It employs a template‑driven query generation process that starts from answer nodes and samples anchor entities, achieving O(md) complexity where m is the number of hops and d the average degree. For negative sampling, a bidirectional search reduces complexity from O(|V|) to O(m d) by walking m/2 steps from both anchor and answer nodes and intersecting the visited sets.

System‑level optimizations include asynchronous computation pipelines, multi‑GPU training with near‑linear speedup, and a memory‑efficient design whose GPU usage is almost independent of graph size. SMORE also introduces several new large‑scale benchmark datasets, increasing the scale of existing benchmarks by over a thousandfold. Empirical results show SMORE achieving up to 2.2× speedup on small graphs, 30.6% lower GPU memory consumption, and the ability to train multi‑hop models on graphs that were previously infeasible.

The session concludes with a summary: embedding‑space reasoning (Query2Box) addresses complex query answering, while SMORE makes such methods practical for massive knowledge graphs, and the code has been open‑sourced on GitHub. A brief Q&A follows, covering the motivation behind Query2Box, the definition of box size, and extensions to union and difference operators.

AIKnowledge GraphLarge Scalebox embeddingmulti-hop reasoningQuery2BoxSMORE
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.