Overview of Database Evolution, Types, and Data Structure Design
This article traces the history of database development from early file systems to relational models, outlines major database categories such as hierarchical, relational, NoSQL, NewSQL, and columnar stores, and explores practical data structure designs like chain tables, bitwise operations, and circular queues for modern software engineering.
Initially, data was managed through file systems where programmers handled storage and maintenance. Tree and network models emerged later but suffered from scalability issues. In the 1970s, Edgar Codd introduced the relational model, organizing data in tables with clear relationships, which became the mainstream database paradigm.
With the rise of high‑concurrency big‑data scenarios, databases have been refined into many specialized types. The following table summarizes typical categories, representative products, and suitable use cases:
Type
Representative Products
Applicable Scenarios
Hierarchical DB (NDB)
IMS/IDMS
Tree‑structured data with parent‑child relationships; fast queries but hard to scale.
Relational DB (RDBMS)
Oracle, MySQL
Transactional consistency requirements.
Key‑Value DB (KVDB)
Redis, Memcached
High‑performance read/write, simple data models.
Document DB (DDB)
MongoDB, CouchDB
Massive semi‑structured data access.
Graph DB (GDB)
Neo4j
Node‑edge based storage; efficient graph queries.
Time‑Series DB (TSDB)
InfluxDB, OpenTSDB
Persisting and aggregating time‑stamped data.
Object DB (ODB)
Db4O
Full OO concepts; niche usage.
Search Engine (SE)
Elasticsearch, Solr
Search‑centric business scenarios.
Columnar DB (WCDB)
HBase, ClickHouse
Distributed storage for massive data and analytical queries.
XML DB (NXD)
MarkLogic
XML document storage and querying.
Content Repository (CDB)
Jackrabbit
Large‑scale, high‑performance content storage.
In the 1970s, Edgar Frank Codd published the seminal paper "A Relational Model of Data for Large Shared Data Banks," which launched the relational database revolution. Relational databases model data as sets of rows and columns, using mathematical set theory to ensure consistency and integrity.
NoSQL (Not Only SQL) databases arose in the era of big data to handle distributed, massive, and schema‑less data that traditional relational systems struggle with. They sacrifice strong ACID guarantees for horizontal scalability and flexible data models. Typical NoSQL categories include:
Type
Representative Products
Application Scenarios
Data Model
Pros & Cons
Key‑Value
Redis, Memcached
Cache, session storage, high‑frequency reads/writes
Key‑value pairs via hash tables
Pros: high scalability, performance; Cons: no structured queries.
Column‑Family
HBase, ClickHouse
Distributed data storage management
Columns stored together
Pros: simple, fast queries; Cons: limited transactional guarantees.
Document
MongoDB, CouchDB
Web apps, semi‑structured JSON documents
Key‑value where value is JSON
Pros: flexible schema; Cons: lack of unified query language.
Graph
Neo4j, InfoGrid
Social networks, recommendation systems, fraud detection
Graph structures (nodes & edges)
Pros: complex relationship queries; Cons: higher complexity, limited scale.
NewSQL combines the scalability of NoSQL with the ACID and SQL features of traditional relational databases. Representative examples are TiDB and OceanBase.
OLTP (On‑Line Transaction Processing) focuses on fast, transactional workloads where user actions are immediately processed and results returned. OLAP (On‑Line Analytical Processing) targets analytical workloads, enabling fast, multidimensional queries for data insight. The following diagram contrasts OLTP and OLAP characteristics.
HTAP (Hybrid Transactional/Analytical Processing) databases integrate both OLTP and OLAP capabilities on a unified architecture, reducing data movement and resource contention.
Domain‑specific databases:
Columnar Databases : Store data by columns, offering higher compression, faster column‑wise queries, better scalability, and strong analytical support, but slower writes and more complex data modeling. Typical use cases include finance, healthcare, telecom, energy, and logistics.
Time‑Series Databases : Optimized for timestamped data, widely used in IoT monitoring and APM. They can handle billions of points per day, with examples like InfluxDB and Prometheus.
Graph Databases : Represent data as nodes and edges, excelling in relationship‑heavy queries such as fraud detection and knowledge graphs. Popular products include Neo4j and various enterprise solutions.
Data Structure Design (Section 4) expands on practical techniques:
4.1 Chain Table (Slowly Changing Dimension)
A chain table records historical changes of dimension data. By adding effective and expiry timestamps to a base table, each change creates a new row while updating the previous row's expiry time. This enables point‑in‑time queries without storing full history in every row.
Primary Key: uniquely identifies each record (often composite).
Effective Time: when the record becomes valid.
Expiry Time: when the record ceases to be valid.
Version: optional version number.
Other dimension attributes (e.g., customer, product).4.2 Bitwise Operations
Bitwise tricks provide compact storage and fast computation. Examples include representing a month’s sign‑in status with a 32‑bit integer, or defining permission and priority flags using bit masks.
public abstract class PriorityManager {
// Business priority constants
public static final int PRIORITY_LOW = 1; // 001
public static final int PRIORITY_NORMAL = 2; // 010
public static final int PRIORITY_HIGH = 4; // 100
// Permission constants
public static final int PERMISSION_READ = 1; // 001
public static final int PERMISSION_WRITE = 2; // 010
public static final int PERMISSION_DELETE = 4; // 100
// Combined values
public static final int PERMISSION_LOW_PRIORITY = PRIORITY_LOW | PERMISSION_READ; // 001
public static final int PERMISSION_NORMAL_PRIORITY = PRIORITY_NORMAL | PERMISSION_READ | PERMISSION_WRITE; // 011
public static final int PERMISSION_HIGH_PRIORITY = PRIORITY_HIGH | PERMISSION_READ | PERMISSION_WRITE | PERMISSION_DELETE; // 111
public static boolean checkPermission(int permission, int priority) {
return (permission & priority) == priority;
}
}4.2.2 Bitmap
A bitmap uses a single bit to represent the existence of an element, achieving massive space savings. For example, 2 billion integers require only ~0.23 GB when stored as bits instead of ~7.45 GB as 4‑byte ints.
4.2.3 Bloom Filter
Bloom filters extend the bitmap idea with multiple hash functions, allowing probabilistic membership tests with a controllable false‑positive rate but zero false negatives. They are employed in web crawlers, spam filters, cache‑penetration protection, and storage engines like HBase and Cassandra.
4.3 Circular Queue
Circular queues model fixed‑size, wrap‑around buffers useful for streaming data, network protocols, and internal components of middleware such as Dubbo, Netty, Akka, Quartz, ZooKeeper, and Kafka. Two common variants are consistent hash rings and time wheels.
4.3.1 Consistent Hash Ring
By mapping both keys and nodes onto a virtual ring, a key is assigned to the first node encountered clockwise. Virtual nodes improve load balance and reduce data movement when physical nodes join or leave.
4.3.2 Time Wheel
A time wheel implements efficient delayed task scheduling. Each bucket represents a time slice (e.g., 1 second). Tasks are placed into the bucket corresponding to their expiration time; a single thread advances the wheel, executing all tasks in the current bucket. This yields O(1) add/remove complexity and low CPU overhead.
In summary, the article explains fundamental database concepts, surveys a wide range of modern database technologies, and demonstrates how classic data‑structure techniques—chain tables, bitwise flags, bitmap, Bloom filter, and circular queues—can be applied to solve real‑world software engineering problems.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.