Databases 12 min read

ByteGraph: ByteDance’s In‑house Graph Database Architecture and Implementation

ByteGraph is ByteDance’s internally developed graph database that stores and queries massive graph data efficiently, featuring a three‑layer architecture of query engine, storage engine, and disk storage, supporting Gremlin, partitioning, indexing, caching, high availability, and integration with online/offline data pipelines.

DataFunSummit
DataFunSummit
DataFunSummit
ByteGraph: ByteDance’s In‑house Graph Database Architecture and Implementation

ByteGraph is ByteDance’s self‑developed graph database designed to handle massive graph data used in social networks, risk control, recommendation systems, and bio‑informatics, addressing the challenges of efficient storage, query, computation, and analysis at scale.

1. Understanding Graph Databases

Graph databases store data as vertices, edges, and properties, enabling more efficient traversal and filtering compared to relational joins.

They have gained popularity in recent years, with query languages such as Cypher and Gremlin and distributed capabilities that require consistency, sharding, and fault‑tolerance mechanisms.

2. Application Scenarios

ByteGraph was launched in 2018 to replace MySQL for storing user behavior and friendship data on Toutiao, later expanding to Douyin and many other internal services.

It now serves over 1.5 × 10⁴ physical machines and 600+ business clusters.

3. Data Model and Query Language

ByteGraph uses a directed property graph model where both vertices and edges carry attribute tables, essentially a key‑value store for graph elements.

Gremlin is chosen as the query language because it is Turing‑complete, integrates well with graph computation, and is familiar to Python‑oriented data analysts.

Example query: retrieve all one‑hop neighbors of user A whose fan count exceeds 100.

4. ByteGraph Architecture and Implementation

4.1 Overall Architecture

ByteGraph consists of three layers: the Graph Query Engine (GQ), the Graph Storage Engine (GS), and the underlying disk storage. The design separates compute from storage, with each layer running multiple process instances in a cluster.

4.2 Read/Write Flow

A read request (e.g., fetch one‑hop neighbors of user A) is routed from the client to a GQ instance, which determines the target GS node, then the GS node checks its local cache or pulls the required data from the KV store.

4.3 GQ Implementation

Parser stage : a recursive‑descent parser converts Gremlin into an abstract syntax tree.

Query plan generation : rule‑based (RBO) and cost‑based (CBO) optimizers produce an executable plan.

Plan execution : the plan is pushed down to GS partitions, minimizing network traffic before merging results.

4.4 GS Implementation

Data is stored in partitions defined by a source vertex and edge type; each partition is a B‑tree with separate WAL logs.

Edge pages and meta pages hold edge data and index keys respectively; pages are stored as KV pairs on disk.

In‑memory cache uses LRU eviction; dirty pages are asynchronously flushed via WAL.

Multiple caching strategies (graph‑native, high‑performance LRU, write‑through) improve latency and throughput.

5. Key Issues Analysis

Indexing : local indexes on (source, edge‑type) accelerate attribute filtering; a global vertex‑property index is also supported, with consistency maintained via distributed transactions.

Hotspot read/write : read hotspots are served by multiple GQ nodes and cached in GS; write hotspots use copy‑on‑write and group‑commit to reduce I/O contention.

Resource allocation : separate thread pools for light and heavy queries allow efficient scheduling.

High availability : dual‑datacenter deployment with follow‑one‑write‑many‑read strategy; WAN disaster‑recovery uses binlog replication and hybrid logical clocks for ordering.

Online‑offline data fusion : batch import of historical data and real‑time writes are unified on an internal data platform for downstream analytics.

Overall, ByteGraph demonstrates a production‑grade graph database solution that combines a flexible property‑graph model, Gremlin query support, sophisticated partitioning and indexing, robust caching, and multi‑region high availability to meet ByteDance’s massive online workloads.

High Availabilitygraph databaseDistributed StorageGremlinByteGraph
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.