Databases 28 min read

Inside ByteGraph: How ByteDance Built a Scalable Distributed Graph Database

The article offers a comprehensive technical deep‑dive into ByteDance’s home‑grown distributed graph database and graph‑processing engine, ByteGraph, covering its directed‑property graph model, Gremlin query support, multi‑layer architecture, storage strategies for massive data, and real‑world graph‑computing practices.

Volcano Engine Developer Services

May 13, 2021

Inside ByteGraph: How ByteDance Built a Scalable Distributed Graph Database

This article provides an in‑depth analysis of ByteDance’s self‑developed distributed graph database and graph‑processing engine, ByteGraph, describing its data model, architecture, key design challenges, and practical graph‑computing use cases.

Graph‑structured Data in ByteDance

Most business data at ByteDance can be categorized into three groups: user information and relationships, content (videos, articles, ads), and the interactions between users and content (likes, comments, shares, ad clicks). These three relationships form a directed property graph.

ByteGraph: A Distributed Graph Storage System

To support online CRUD scenarios for the social graph, ByteDance built ByteGraph, a distributed graph storage system that supports a directed property‑graph model, Gremlin query language, flexible read/write APIs, and can scale to tens of millions of QPS with millisecond‑level latency. ByteGraph serves OLTP workloads for products such as Toutiao, Douyin, Xigua, and Volcano.

Data Model and API

The core elements of ByteGraph are Vertex (point), Edge (directed relationship), and their attributes. Vertices store static information, while edges capture relationships such as "User A follows User B".

Vertex schema example:

Edge schema example:

Storage Strategies for Massive Graphs

Massive data volume : billions of vertices and trillions of edges.

High throughput : cluster QPS reaches tens of millions.

Low latency : 99th‑percentile latency kept at millisecond level.

Read‑heavy : read traffic is about 100× write traffic.

Light queries dominate : 90% of queries involve two‑hop traversals.

Disaster‑recovery : supports multi‑region active‑standby and multi‑active deployments.

ByteGraph stores a vertex together with all its outgoing edges as a group . Small out‑degree vertices use a single KV pair (first‑level storage). When out‑degree grows, ByteGraph switches to a distributed B‑Tree (second‑level storage) that splits the edge list into multiple KV pairs, enabling efficient binary search for reads and writes.

Three‑Layer Architecture

ByteGraph is divided into:

Query layer (bgdb) : parses Gremlin queries, generates execution plans, routes requests to storage nodes, aggregates results.

Storage/transaction layer (bgkv) : sharded KV engine written in C++, provides point‑edge read/write APIs, supports operator push‑down and cache‑storage integration.

Disk storage layer : a distributed KV store that persists data across data centers.

Graph Computing System

While ByteGraph focuses on OLTP, ByteDance also built a dedicated graph‑processing engine for OLAP workloads. The system supports batch processing, vertex‑centric (Pregel/GAS), edge‑centric, and sub‑graph models, and offers multiple execution and communication models (synchronous, asynchronous, semi‑synchronous; push, pull, shared memory).

Key open‑source inspirations include Pregel, Giraph, GraphX, and Gemini. ByteDance customized Gemini (open‑source) to handle trillions of edges and billions of vertices, introducing chunk‑based partitioning, adaptive push/pull, and hierarchical B‑Tree storage.

Future Directions

Hybrid memory‑and‑storage computing using AEP/NVMe.

Dynamic graph computation for continuously changing data.

Heterogeneous acceleration with GPU/FPGA.

Graph‑specific language and compiler to decouple business logic from engine.

Conclusion

In a short period, ByteDance built a full‑stack graph solution that powers billions of users, yet many challenges remain. Ongoing research will continue to push the limits of scalability, latency, and usability for massive graph workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data graph database Distributed storage graph computing Gremlin ByteGraph

Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.