Databases 10 min read

Exploring Data Models: From Hierarchical to Graph and Schema-on-Read/Write

This article examines the evolution of data models—from conceptual, logical, and physical layers to hierarchical, network, relational, document, and graph structures—explaining their characteristics, implementation examples, and the contrasting schema‑on‑read versus schema‑on‑write approaches for modern data storage systems.

Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Exploring Data Models: From Hierarchical to Graph and Schema-on-Read/Write

What Is a Data Model

A data model is an abstract framework that describes how data is organized, stored, and manipulated. It defines structures, relationships, constraints, and operations, forming the foundation of database system design and determining both logical views and physical storage methods.

Conceptual, Logical, and Physical Models

Conceptual model: Focuses on core business entities and relationships, capturing essential data to clarify requirements.

Logical model: Refines the conceptual model by specifying attributes, data types, and constraints, independent of any specific database technology.

Physical model: Implements the logical model in a concrete DBMS, detailing tables, indexes, partitions, and other storage specifics.

Evolution of Data Models

Data storage systems have progressed through several model types: hierarchical, network, relational, document, and graph. Understanding the earlier models helps grasp the later ones.

Hierarchical Model

A tree‑like structure with strict parent‑child relationships, supporting one‑to‑one and one‑to‑many links. It is limited in flexibility and cannot represent many‑to‑many relationships.

Network Model (CODASYL)

Generalizes the hierarchical model by allowing records to have multiple parents, thus supporting many‑to‑many relationships via pointer links. However, reliance on physical pointers can lead to lower query efficiency.

Relational Model

Uses two‑dimensional tables to express logical relationships, while physical storage enforces constraints through foreign keys.

<code>// Java example
int s = 10;
String text = "this is text";</code>
<code>// Python example
s = 10
text = "this is text"</code>

Document Model

Represents data as self‑contained documents (e.g., JSON). References provide logical links, while embedding offers a hierarchical, tree‑like structure for small related datasets.

<code>// User document
{
  "_id": ObjectId("60d5ec9f8b487a3e8aef1234"),
  "name": "Alice",
  "orders": [ObjectId("60d5ec9f8b487a3e8aef5678"), ObjectId("60d5ec9f8b487a3e8aef9abc")]
}

// Order document
{
  "_id": ObjectId("60d5ec9f8b487a3e8aef5678"),
  "product": "Laptop",
  "price": 1200
}</code>
<code>// Embedded orders example
{
  "_id": ObjectId("60d5ec9f8b487a3e8aef1234"),
  "name": "Alice",
  "orders": [
    {"product": "Laptop", "price": 1200},
    {"product": "Phone", "price": 800}
  ]
}</code>

Graph Model

Extends the network model by abstracting physical pointers into logical edges, ideal for complex, highly connected data. Example Cypher query visualized in the following diagram:

Query to find users born in the United States but living in Europe:

Read‑Time vs. Write‑Time Schema

Read‑Time (Schema‑on‑Read)

Data is stored without a predefined schema; structure is applied dynamically during query execution. This offers flexibility similar to dynamic typing in programming languages but may incur performance costs.

<code>if (user && user.name && !user.first_name) {
    user.first_name = user.name.split(" ")[0];
}</code>

Write‑Time (Schema‑on‑Write)

Data must conform to a predefined schema before being written, typically enforced by the DBMS through ETL processes. This ensures integrity at the cost of flexibility.

<code>ALTER TABLE users ADD COLUMN first_name varchar(50) DEFAULT NULL COMMENT 'user first name';
UPDATE users SET first_name = substring_index(name, ' ', 1);</code>

Read‑Write Mode Comparison

Data Model Summary

The article concludes with a comparative table (image) that highlights the distinctions among hierarchical, network, relational, document, and graph models, noting that modern applications predominantly use relational, document, and graph models, alongside column‑family (e.g., HBase) and key‑value (e.g., Redis) stores.

graph databaseData ModelingDatabasesrelational modelschema-on-readschema-on-write
Xiaokun's Architecture Exploration Notes
Written by

Xiaokun's Architecture Exploration Notes

10 years of backend architecture design | AI engineering infrastructure, storage architecture design, and performance optimization | Former senior developer at NetEase, Douyu, Inke, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.