How ByteGraph 3.0 Redefines Scalable Graph Database Architecture
This article presents a comprehensive technical overview of ByteGraph, covering its evolution from 2.0 to 3.0, core graph modeling capabilities, Gremlin query interface, architectural layers, performance bottlenecks, cost‑reduction strategies, and future roadmap for large‑scale graph data services.
ByteGraph Overview
ByteGraph is a graph database developed by the ByteDance graph team, focusing on the storage layer and the third‑generation distributed graph storage architecture. It aims to model user‑centric data such as user profiles, relationships, content, and interaction events, enabling clear representation of social and recommendation scenarios.
What ByteGraph Can Do
User information and relationships
Content objects (videos, articles, ads)
User‑content interactions (likes, comments, shares)
Modeling these entities as a directed property graph simplifies queries for friend‑of‑friend, recommendation, and risk‑control use cases.
Query Interface
ByteGraph exposes a Gremlin‑compatible query language, a Turing‑complete graph traversal language that is more expressive than Cypher and widely supported by cloud providers.
Data Model
It uses a directed property graph where both vertices and edges can carry multiple attributes, supporting dynamic addition or removal of property columns similar to MySQL DDL.
Example: find all one‑hop friends of user A whose fan count exceeds 100.
g.V(vertex(A.id, A.type)).out('好友').where(in('粉丝关注').count().is(gt(100))).toList()The query first locates vertex A, traverses outgoing "friend" edges, filters neighbors whose inbound "fan‑follow" count is greater than 100, and returns the matching vertices.
Business Scenarios
Douyin user relationships (likes, followers)
Recommendation pipelines (friends‑watch, friends‑liked)
Knowledge‑graph applications (search, education, e‑commerce)
Internal micro‑service tracing
Payment risk‑control (cycle detection for cash‑out)
ByteGraph currently serves over 1,000 business clusters with a server fleet exceeding ten thousand machines.
ByteGraph 2.0 Architecture
The 2.0 architecture, built since 2018, consists of three layers:
Graph Query Engine (GQ) – the query front‑end.
Graph Storage Engine (GS) – the storage back‑end.
Distributed KV store – the underlying key‑value layer, inspired by TiDB.
This layered design allows independent scaling of each tier. Complex queries can scale the query layer, memory pressure can scale the middle layer, and storage growth can scale the KV layer. However, the many layers also introduce latency and CPU overhead.
Modules in 2.0
The query engine parses Gremlin into an AST, generates a physical plan, and executes it. The storage engine abstracts data as pure vertex storage and pure edge storage.
Execution Engine Example
g.V().has('id', 1).has('type', person).out('knows').has('age', gt(18)).values('name')This query finds the names of people known by the vertex with id 1 who are older than 18.
Storage Engine Details
Vertex storage stores vertex attributes in KV. Edge storage aggregates adjacency lists into an Edge Storage component organized as a B‑tree, avoiding per‑edge KV entries that would cause write‑amplification.
Problems in ByteGraph 2.0
Cost
Three‑data‑center 3/5 replication incurs high storage cost; erasure‑coding (EC) could reduce replicas to 2 per data center, saving 30‑60%.
LSM‑Tree based KV suffers from write amplification and heavy compaction CPU usage.
Multiple cache layers (BlockCache, GS BufferPool) duplicate hot data, reducing overall system efficiency.
Performance
Four‑layer stack (graph → memory → KV‑Proxy → KV‑Server) adds latency and CPU overhead, especially for multi‑hop traversals.
Multi‑hop queries generate many RPC calls, limiting performance for recommendation and risk‑graph workloads.
Per‑vertex WAL introduces distributed‑transaction overhead.
One‑hop scans dominate workload, but the current page layout stores rows, forcing full‑page scans even when only a few columns are needed, leading to high memory latency and bandwidth consumption.
ByteGraph 3.0 Solution
Cost Reductions
Replace KV with a DFS‑based pooled storage (ByteStore) using EC to achieve two‑replica durability, lowering TCO.
Merge LSM‑Tree into a Bw‑Tree engine, reducing write amplification from dozens of times to 2‑3×.
Adopt column‑store layout inside B‑tree pages to improve cache locality for one‑hop scans.
Performance Improvements
Merge query and storage engines into a single process, eliminating cross‑process RPC for multi‑hop traversals.
Increase shard granularity: prefer single‑tablet deployments with one primary and multiple replicas; when sharding is necessary, use hash partitioning to balance load and increase 1PC transaction ratio.
Introduce a new pipeline execution engine that fuses multiple steps, reduces channel‑based communication, and exploits NUMA‑aware task scheduling.
New Features
Log‑based master‑slave synchronization with controllable latency, supporting strong consistency reads when needed.
Unified storage layer (shared storage) that serves graph queries, full‑graph computation, and GNN data loading.
ByteGraph 3.0 Architecture
The architecture combines query and storage layers similar to MySQL, adds a graph‑specific query language (GQL), and builds on a shared storage foundation (ByteStore) implemented with a DFS‑backed Bw‑Tree.
Data is partitioned into Tablets (usually a single tablet per cluster). Each Tablet contains a hash‑partitioned subset of the graph, with separate RW and RO instances. A shared journal synchronizes writes, and an Append‑Only Blob layer provides multi‑replica durability.
Parallel Execution Engine
Pipeline merges multiple steps into a single execution unit, reducing channel overhead.
Data‑parallel pipelines launch multiple tasks to fully utilize multi‑core CPUs.
NUMA‑aware scheduler improves data locality.
Storage Engine Modules
Journal Engine – manages write‑ahead logs.
Mem Engine – in‑memory B‑tree structures, similar to GS2.
Page Engine – handles on‑disk B‑tree pages, split into Page Index (metadata) and Page Data (base/delta streams).
Storage Engine Flow
During a flush, the Mem Engine writes dirty pages to the Page Engine, which stores base and delta streams in separate blobs. Index updates are persisted via WAL. Reads first consult the Page Index to locate the appropriate blob, then perform at most two I/O operations.
Master‑slave replication shares the same journal and page index, allowing RW and RO nodes to stay consistent with bounded latency.
Advanced Reclaim Strategies
Separate Base and Delta streams so that frequently updated deltas can be GC‑ed independently of cold base data.
Maintain per‑blob statistics (usage, last update) to guide space reclamation, selecting low‑usage or low‑update blobs for garbage collection.
Future Outlook
Business Benefits
Storage cost reduction of 30‑50%.
Multi‑hop query performance several times higher than the previous system in single‑tablet deployments.
Planned Work
Complete all 3.0 features and continue scaling internal services and Volcano Engine.
Position ByteGraph as a unified storage substrate for graph query engines, full‑graph compute engines, and GNN data loaders.
Build a "Single‑Engine" ecosystem that offers end‑to‑end graph data services across databases, GNN, graph compute, and Spark/Hadoop ecosystems.
Q&A
Q1: Is the Bw‑Tree implementation based on Microsoft’s 2013 paper?
A1: It is not a direct copy; ByteGraph adopts three ideas from the paper: memory design, disk layout, and some delta implementation details, while customizing other aspects.
Q2: How much has write‑amplification been reduced?
A2: Write amplification is now around 2‑3×, down from the previous 40‑plus factor caused by deep compaction layers.
Q3: For multi‑hop scenarios, how do you handle data affinity across many machines?
A3: While co‑locating highly related data can improve latency, massive fan‑out graphs make full co‑location impractical; ByteGraph relies on engineering optimizations such as large‑memory nodes and columnar storage rather than exhaustive affinity placement.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
