Databases 24 min read

Distributed Graph Database Practice at Beike: From JanusGraph to Dgraph

This article presents Beike's experience building a large‑scale graph database platform, covering the need for graph databases, technology selection between JanusGraph and Dgraph, detailed architecture, data ingestion pipelines, query interfaces, performance benchmarks, and future roadmap.

DataFunTalk
DataFunTalk
DataFunTalk
Distributed Graph Database Practice at Beike: From JanusGraph to Dgraph

The presentation begins by questioning how a 48‑billion‑triple industry graph can achieve millisecond‑level queries and why traditional relational databases, Elasticsearch, or HBase cannot meet such complex, multi‑dimensional queries, leading to the conclusion that a graph database is required.

It then introduces graph databases, explaining that they store nodes and edges as graph structures rather than tables, and lists typical use cases such as social networks, recommendation systems, risk control, and knowledge graphs.

For technology selection, the authors compare several open‑source graph databases (Neo4j, OrientDB, ArangoDB, JanusGraph, Dgraph). They evaluate criteria like open‑source status, maturity, scalability, documentation, performance, stability, operational cost, and usability. JanusGraph relies on external storage (Cassandra, HBase) and indexing (Elasticsearch, Solr), resulting in high maintenance overhead, while Dgraph provides a native, all‑in‑one distributed solution.

The comparison highlights that Dgraph’s architecture consists of a zero node for cluster coordination and multiple alpha nodes for storage, indexing, and query execution, with automatic data balancing and strong consistency via Raft. JanusGraph, by contrast, depends on external systems for storage and indexing, leading to higher complexity and lower stability.

After selecting Dgraph, the team builds the platform using Docker and Kubernetes, deploying three servers each running four Dgraph nodes (three alphas and one zero). The deployment script ensures that replicas are spread across different machines to guarantee high availability.

Data ingestion is handled through three modes: real‑time streaming via a Data‑Accepter module feeding Kafka, batch streaming from Hive/HDFS also via Kafka, and bulk loading using Dgraph’s Bulk Loader with MapReduce to generate shard files before loading them into the cluster.

For querying, the platform provides a visual UI (Ratel) and a custom Graph‑SQL layer that translates SQL‑like statements into Dgraph’s GraphQL+‑ syntax, simplifying usage for developers and analysts familiar with SQL.

Performance tests on a 48‑core, 128 GB machine with 30 GB of data show Dgraph achieving up to 35,000 writes/s for real‑time inserts and maintaining sub‑50 ms response times under 1,000 concurrent threads, outperforming JanusGraph especially on deep‑graph queries.

The authors also discuss Dgraph’s limitations: lack of multi‑edge support, single‑graph per cluster, limited big‑data ecosystem integration, and relative immaturity compared to longer‑standing projects.

Future work includes deeper performance and stability optimizations, extending support for multiple graphs, integrating the platform into the broader search cloud architecture, and adding vector‑search capabilities alongside existing Elasticsearch and Dgraph engines.

distributed systemsgraph databasePerformance Testingknowledge graphDgraphData ingestionJanusGraph
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.