Databases 25 min read

How Beike Achieved Millisecond Queries on a 48‑Billion‑Triple Graph with Dgraph

This article details Beike's journey of storing and querying a 480‑billion‑triple industry graph in milliseconds, covering graph database fundamentals, a comparative evaluation of JanusGraph and Dgraph, the design and deployment of a Docker‑K8s based Dgraph platform, data ingestion pipelines, a custom Graph‑SQL layer, performance testing, optimizations, and future roadmap.

dbaplus Community
dbaplus Community
dbaplus Community
How Beike Achieved Millisecond Queries on a 48‑Billion‑Triple Graph with Dgraph

Introduction

Beike’s industry graph contains roughly 480 billion RDF‑style triples describing properties, agents, developers, neighborhoods, etc. The system must serve millisecond‑level queries under high concurrency.

Graph‑Database Overview

A graph database stores vertices (nodes) and edges (relationships) directly, enabling expressive traversals that are inefficient in relational or document stores. Typical use cases include social networks, road networks, recommendation systems, risk analysis, and knowledge graphs.

Technology Selection

Four open‑source candidates were evaluated: Neo4j, OrientDB, ArangoDB, JanusGraph, and Dgraph. Evaluation criteria included open‑source license, maturity, scalability, documentation, performance, stability, operational cost, and usability.

Neo4j – single‑node only in the community edition, unsuitable for petabyte‑scale workloads.

OrientDB / ArangoDB – originally single‑node; later added clustering but with limited stability.

JanusGraph – requires external storage (Cassandra/HBase) and external indexing (Elasticsearch), leading to high operational overhead.

Dgraph – native distributed storage, Raft‑based strong consistency, built‑in indexing, and a single binary executable.

Because Dgraph provides a simpler stack, automatic data balancing, and lower maintenance cost, it was chosen as the graph‑database engine.

Platform Construction

Cluster topology : Three physical servers, each running Docker containers managed by Kubernetes. On each server four Dgraph processes are launched: three alpha nodes (data, index, query) and one zero node (cluster coordinator). Each group consists of three alpha replicas; Raft ensures strong consistency.

Deployment commands (example) :

# start the zero service with three replicas
 dgraph zero --replicas 3 --my=zero1:5080

# start an alpha and point it to the zero service
 dgraph alpha --zero=zero1:5080 --my=alpha1:7080

Data ingestion modes :

Real‑time stream : Changes are sent to a Data‑Accepter service, buffered in Kafka, and consumed by a Graph‑Import module that writes to Dgraph via gRPC/HTTP.

Batch stream : Hive/HDFS data is extracted with Spark, written to Kafka, and then imported the same way as the real‑time path.

Bulk initialization : The Dgraph Bulk Loader generates predicate‑sharded data and index files using a MapReduce job. Alpha nodes then load these files directly, reducing a 48‑hour import to ~15 hours after multi‑node parallelisation.

Query interface : Users can query through Dgraph’s visual UI (Ratel) or via a custom “Graph‑SQL” layer that translates familiar SQL‑style statements into GraphQL+‑ queries, lowering the learning curve for developers and analysts.

Principles, Optimizations, and Limitations

Storage engine : Dgraph uses Badger, a Go‑implemented LSM‑tree key‑value store. Badger claims ~3.5× faster random reads than RocksDB. Data is stored as (predicate, subject) → [sorted list of value‑ids], enabling a single RPC to retrieve all values for a predicate, which speeds up multi‑hop traversals.

Sharding & rebalancing : Data is sharded by predicate. The zero node periodically runs a rebalance routine ( rebalance_interval) to keep shard sizes balanced across alphas.

High availability : Each group has at least three alphas; Raft replicates writes and guarantees strong consistency. Write‑ahead logs (WAL) are flushed to disk before in‑memory writes, ensuring durability after crashes.

Performance testing (48‑core, 128 GB RAM, SATA disks, 30 GB dataset ≈ 4.5 billion triples):

Write throughput: up to 15 000 triples/s with sustained latency < 50 ms under 1 000 concurrent queries.

Read latency: sub‑50 ms for simple attribute queries; complex three‑hop traversals remain in the low‑millisecond range on Dgraph, whereas JanusGraph degrades to > 700 ms.

QPS: ~15 000 queries per second at 1 000 concurrent clients.

Identified limitations :

Only a single graph per cluster (multi‑graph support is under development).

No native multi‑edge (multiple edges with the same label between two vertices).

Integration with big‑data ecosystems (e.g., Spark bulk writes) requires custom tooling; Dgraph’s Go client may become a bottleneck for massive parallel writes.

Relative immaturity compared to long‑standing solutions; occasional bugs are being fixed rapidly.

Future Plans

Planned work includes deeper performance tuning of Badger, further optimisation of the Bulk Loader for petabyte‑scale imports, and tighter integration of the graph engine into a unified search cloud platform alongside Elasticsearch and Faiss. Additional goals are:

Provide UI‑driven configuration and monitoring for the Dgraph cluster.

Expose the Graph‑SQL layer as a first‑class service with extended syntax (e.g., shortestpath, degree, GROUP BY, HAVING, ORDER BY, LIMIT).

Support diverse graph use cases such as risk management, knowledge graphs, and recommendation systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed Systemsgraph databasePerformance TestingBeikeDgraphGraph SQL
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.