Databases 10 min read

Exploring Data Models: From Hierarchical to Graph and Schema-on-Read/Write

This article examines the evolution of data models—from conceptual, logical, and physical layers to hierarchical, network, relational, document, and graph structures—explaining their characteristics, implementation examples, and the contrasting schema‑on‑read versus schema‑on‑write approaches for modern data storage systems.

Xiaokun's Architecture Exploration Notes

Mar 29, 2025

Exploring Data Models: From Hierarchical to Graph and Schema-on-Read/Write

What Is a Data Model

A data model is an abstract framework that describes how data is organized, stored, and manipulated. It defines structures, relationships, constraints, and operations, forming the foundation of database system design and determining both logical views and physical storage methods.

Conceptual, Logical, and Physical Models

Conceptual model: Focuses on core business entities and relationships, capturing essential data to clarify requirements.

Logical model: Refines the conceptual model by specifying attributes, data types, and constraints, independent of any specific database technology.

Physical model: Implements the logical model in a concrete DBMS, detailing tables, indexes, partitions, and other storage specifics.

Evolution of Data Models

Data storage systems have progressed through several model types: hierarchical, network, relational, document, and graph. Understanding the earlier models helps grasp the later ones.

Hierarchical Model

A tree‑like structure with strict parent‑child relationships, supporting one‑to‑one and one‑to‑many links. It is limited in flexibility and cannot represent many‑to‑many relationships.

Network Model (CODASYL)

Generalizes the hierarchical model by allowing records to have multiple parents, thus supporting many‑to‑many relationships via pointer links. However, reliance on physical pointers can lead to lower query efficiency.

Relational Model

Uses two‑dimensional tables to express logical relationships, while physical storage enforces constraints through foreign keys.

// Java example
int s = 10;
String text = "this is text";

// Python example
s = 10
text = "this is text"

Document Model

Represents data as self‑contained documents (e.g., JSON). References provide logical links, while embedding offers a hierarchical, tree‑like structure for small related datasets.

// User document
{
  "_id": ObjectId("60d5ec9f8b487a3e8aef1234"),
  "name": "Alice",
  "orders": [ObjectId("60d5ec9f8b487a3e8aef5678"), ObjectId("60d5ec9f8b487a3e8aef9abc")]
}

// Order document
{
  "_id": ObjectId("60d5ec9f8b487a3e8aef5678"),
  "product": "Laptop",
  "price": 1200
}

// Embedded orders example
{
  "_id": ObjectId("60d5ec9f8b487a3e8aef1234"),
  "name": "Alice",
  "orders": [
    {"product": "Laptop", "price": 1200},
    {"product": "Phone", "price": 800}
  ]
}

Graph Model

Extends the network model by abstracting physical pointers into logical edges, ideal for complex, highly connected data. Example Cypher query visualized in the following diagram:

Query to find users born in the United States but living in Europe:

Read‑Time vs. Write‑Time Schema

Read‑Time (Schema‑on‑Read)

Data is stored without a predefined schema; structure is applied dynamically during query execution. This offers flexibility similar to dynamic typing in programming languages but may incur performance costs.

if (user && user.name && !user.first_name) {
    user.first_name = user.name.split(" ")[0];
}

Write‑Time (Schema‑on‑Write)

Data must conform to a predefined schema before being written, typically enforced by the DBMS through ETL processes. This ensures integrity at the cost of flexibility.

ALTER TABLE users ADD COLUMN first_name varchar(50) DEFAULT NULL COMMENT 'user first name';
UPDATE users SET first_name = substring_index(name, ' ', 1);

Read‑Write Mode Comparison

Data Model Summary

The article concludes with a comparative table (image) that highlights the distinctions among hierarchical, network, relational, document, and graph models, noting that modern applications predominantly use relational, document, and graph models, alongside column‑family (e.g., HBase) and key‑value (e.g., Redis) stores.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Graph Database data modeling databases relational model schema-on-read schema on write

Written by

Xiaokun's Architecture Exploration Notes

10 years of backend architecture design | AI engineering infrastructure, storage architecture design, and performance optimization | Former senior developer at NetEase, Douyu, Inke, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.