Databases 11 min read

Understanding Nebula Graph: Data Model and Architecture Explained

This article introduces Nebula Graph, an open‑source distributed graph database, detailing its directed property graph model, vertex and edge schemas, graph partitioning, storage, metadata, query engine, and client APIs, highlighting its strong schema design, high‑availability architecture, and scalability for trillion‑scale graphs.

Programmer DD

Sep 10, 2019

Understanding Nebula Graph: Data Model and Architecture Explained

Nebula Graph: an open‑source distributed graph database. As the only online graph database capable of storing trillions of nodes and edges with properties, Nebula Graph meets millisecond‑level low‑latency queries under high concurrency, while ensuring high availability and data safety.

This article mainly introduces Nebula Graph's data model and system architecture design.

Directed Property Graph

Nebula Graph uses an easy‑to‑understand directed property graph for modeling, meaning the graph consists of two element types: vertices and edges.

Vertex

In Nebula Graph a vertex is composed of a tag and its associated attribute group; the tag represents the vertex type, and the attribute group (the schema) defines one or more properties belonging to that tag. A vertex must have at least one type (tag) and can have multiple types, each with its own schema.

In the example diagram there are two vertex tags: player and team. The player schema includes three properties: ID (vid), Name (string) and Age (int). The team schema includes two properties: ID (vid) and Name (string).

Like MySQL, Nebula Graph is a strong‑schema database; attribute names and data types are defined before data insertion.

Edge

In Nebula Graph an edge consists of an edge type and edge attributes. All edges are directed, indicating a relationship from a source vertex ( src) to a destination vertex ( dst). The edge type is called an edgetype, and each edge has exactly one edgetype whose schema defines the edge attributes.

In the diagram there are two edge types: a like relationship from player to player with a likeness (double) attribute, and a serve relationship from player to team with start_year (int) and end_year (int) attributes.

Note: multiple edges of the same or different types can exist between the same source and destination.

Graph Partition

Because ultra‑large graphs can contain billions to trillions of nodes and even more edges, the data must be split across logical partitions. Nebula Graph uses edge partitioning with a default hash‑based strategy; the number of partitions is statically configured and cannot be changed.

Data Model

Each vertex in Nebula Graph is modeled as a key-value pair; the vertex ID (vid) is hashed and stored on the corresponding partition.

A logical edge is represented by two independent key-value entries called out-key and in-key. The out-key is stored on the same partition as the source vertex, while the in-key is stored with the destination vertex.

Architecture

Nebula Graph consists of four main functional modules: Storage, Meta Service, Compute, and Client.

Storage

The storage layer runs the nebula-storaged process, which implements a Raft‑based distributed Key-valueStorage. Supported storage engines include RocksDB and HBase. Raft ensures consistency via leader/follower replication.

Parallel Raft: multiple machines with the same partition‑id form a Raft group, enabling concurrent operations.

Write Path & Batch: batch and out‑of‑order submissions improve throughput compared to strict log‑id ordering.

Learner: asynchronous replication learner nodes can be added, sync data from the leader, and become followers once caught up.

Load‑balance: hot partitions can be migrated to less loaded machines for better balance.

Meta Service

The meta service runs the nebula-metad process and provides:

User management with four roles: Goduser, Admin, User, Guest, each with different permissions.

Cluster configuration management for adding or removing servers.

Graph space management: create, delete, and modify graph spaces, including Raft replica settings.

Schema management: strong schema design, recording tag and edge attribute types (int, double, timestamp, list, etc.), supporting versioning, TTL fields for automatic data expiration and space reclamation.

Meta Service state is persisted via the same KVStore mechanism as the storage layer.

Query Engine & Query Language (nGQL)

The compute layer runs the nebula-graphd process, a set of stateless, peer‑to‑peer query nodes. The Query Engine parses nGQL text using a lexer and parser, generates an execution plan, optimizes it, and hands it to the execution engine, which fetches schema from Meta Service and data from the storage layer.

Key optimizations include:

Asynchronous and concurrent execution with per‑query resource pools to guarantee QoS.

Computation push‑down: filters such as where are sent to storage nodes to reduce data transfer.

Execution plan optimization: plan caching and context‑independent statement concurrency.

Client API & Console

Nebula Graph provides C++, Java, and Golang client libraries. Communication with the server uses RPC over the Facebook‑Thrift protocol. Users can also operate Nebula Graph via a Linux console; a web UI is under development.

Official Assistant

For any questions, you can contact the Nebula Graph assistant on WeChat (NebulaGraphbot) or star the project on GitHub.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

system architecture graph database Distributed storage data-model Nebula Graph

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.