From Google’s Graphd to Dgraph: Building Distributed Graph Database Systems
ManishRai Jain recounts his journey from Google’s single‑process Graphd, built for Freebase, to creating Dgraph, a distributed graph‑database that shards SPO triples by predicate, avoids fan‑out broadcasts, and supports deep traversals, illustrating the technical evolution and design choices behind modern scalable graph systems.
Author ManishRai Jain, founder of Dgraph Labs, shares his experience building graph database systems from his time at Google to the creation of Dgraph.
He explains why Google needed a new graph database service capable of handling knowledge‑graph data and high‑throughput search queries, describing the limitations of existing systems such as Bigtable, SSTable, and Borg.
Google’s internal project Graphd, originally powering Freebase, ran as a single‑process daemon with all data in memory and required machines with >64 GB RAM. Attempts to rewrite Graphd for distributed operation led to projects like MindMeld and eventually the Dgraph system.
Dgraph was designed as a distributed graph‑database service that stores SPO triples, shards data by predicate, and executes arbitrary‑depth traversals with at most two network round‑trips, eliminating the fan‑out broadcast problem that plagues many non‑native graph stores.
The article also describes the “Cerebro” knowledge‑graph engine built on top of Google’s search index, its ability to assign IDs to triples, generate multiple interpretations of queries, and provide rich filtering and sorting based on entity types.
It discusses the connection‑depth issue in distributed queries, the latency impact of broadcasting across many servers, and how Dgraph’s predicate‑based sharding mitigates this problem.
Additional projects such as Plasma (a real‑time graph index unifying Google OneBox services) and the challenges of management turnover are covered, illustrating the evolution from internal prototypes to the open‑source Dgraph product.
Overall, the piece offers a historical and technical perspective on the design decisions that shaped modern distributed graph databases.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.