Databases 28 min read

Overview of Database Evolution, Types, and Data Structure Design

This article traces the history of database development from early file systems to relational models, outlines major database categories such as hierarchical, relational, NoSQL, NewSQL, and columnar stores, and explores practical data structure designs like chain tables, bitwise operations, and circular queues for modern software engineering.

JD Retail Technology

May 26, 2023

Overview of Database Evolution, Types, and Data Structure Design

Initially, data was managed through file systems where programmers handled storage and maintenance. Tree and network models emerged later but suffered from scalability issues. In the 1970s, Edgar Codd introduced the relational model, organizing data in tables with clear relationships, which became the mainstream database paradigm.

With the rise of high‑concurrency big‑data scenarios, databases have been refined into many specialized types. The following table summarizes typical categories, representative products, and suitable use cases:

Type

Representative Products

Applicable Scenarios

Hierarchical DB (NDB)

IMS/IDMS

Tree‑structured data with parent‑child relationships; fast queries but hard to scale.

Relational DB (RDBMS)

Oracle, MySQL

Transactional consistency requirements.

Key‑Value DB (KVDB)

Redis, Memcached

High‑performance read/write, simple data models.

Document DB (DDB)

MongoDB, CouchDB

Massive semi‑structured data access.

Graph DB (GDB)

Neo4j

Node‑edge based storage; efficient graph queries.

Time‑Series DB (TSDB)

InfluxDB, OpenTSDB

Persisting and aggregating time‑stamped data.

Object DB (ODB)

Db4O

Full OO concepts; niche usage.

Search Engine (SE)

Elasticsearch, Solr

Search‑centric business scenarios.

Columnar DB (WCDB)

HBase, ClickHouse

Distributed storage for massive data and analytical queries.

XML DB (NXD)

MarkLogic

XML document storage and querying.

Content Repository (CDB)

Jackrabbit

Large‑scale, high‑performance content storage.

In the 1970s, Edgar Frank Codd published the seminal paper "A Relational Model of Data for Large Shared Data Banks," which launched the relational database revolution. Relational databases model data as sets of rows and columns, using mathematical set theory to ensure consistency and integrity.

NoSQL (Not Only SQL) databases arose in the era of big data to handle distributed, massive, and schema‑less data that traditional relational systems struggle with. They sacrifice strong ACID guarantees for horizontal scalability and flexible data models. Typical NoSQL categories include:

Type

Representative Products

Application Scenarios

Data Model

Pros & Cons

Key‑Value

Redis, Memcached

Cache, session storage, high‑frequency reads/writes

Key‑value pairs via hash tables

Pros: high scalability, performance; Cons: no structured queries.

Column‑Family

HBase, ClickHouse

Distributed data storage management

Columns stored together

Pros: simple, fast queries; Cons: limited transactional guarantees.

Document

MongoDB, CouchDB

Web apps, semi‑structured JSON documents

Key‑value where value is JSON

Pros: flexible schema; Cons: lack of unified query language.

Graph

Neo4j, InfoGrid

Social networks, recommendation systems, fraud detection

Graph structures (nodes & edges)

Pros: complex relationship queries; Cons: higher complexity, limited scale.

NewSQL combines the scalability of NoSQL with the ACID and SQL features of traditional relational databases. Representative examples are TiDB and OceanBase.

OLTP (On‑Line Transaction Processing) focuses on fast, transactional workloads where user actions are immediately processed and results returned. OLAP (On‑Line Analytical Processing) targets analytical workloads, enabling fast, multidimensional queries for data insight. The following diagram contrasts OLTP and OLAP characteristics.

HTAP (Hybrid Transactional/Analytical Processing) databases integrate both OLTP and OLAP capabilities on a unified architecture, reducing data movement and resource contention.

Domain‑specific databases:

Columnar Databases : Store data by columns, offering higher compression, faster column‑wise queries, better scalability, and strong analytical support, but slower writes and more complex data modeling. Typical use cases include finance, healthcare, telecom, energy, and logistics.

Time‑Series Databases : Optimized for timestamped data, widely used in IoT monitoring and APM. They can handle billions of points per day, with examples like InfluxDB and Prometheus.

Graph Databases : Represent data as nodes and edges, excelling in relationship‑heavy queries such as fraud detection and knowledge graphs. Popular products include Neo4j and various enterprise solutions.

Data Structure Design (Section 4) expands on practical techniques:

4.1 Chain Table (Slowly Changing Dimension)

A chain table records historical changes of dimension data. By adding effective and expiry timestamps to a base table, each change creates a new row while updating the previous row's expiry time. This enables point‑in‑time queries without storing full history in every row.

<span>Primary Key: uniquely identifies each record (often composite).</span></code><code><span>Effective Time: when the record becomes valid.</span></code><code><span>Expiry Time: when the record ceases to be valid.</span></code><code><span>Version: optional version number.</span></code><code><span>Other dimension attributes (e.g., customer, product).</span>

4.2 Bitwise Operations

Bitwise tricks provide compact storage and fast computation. Examples include representing a month’s sign‑in status with a 32‑bit integer, or defining permission and priority flags using bit masks.

<span>public abstract class PriorityManager {</span></code><code><span>    // Business priority constants</span></code><code><span>    public static final int PRIORITY_LOW = 1;    // 001</span></code><code><span>    public static final int PRIORITY_NORMAL = 2; // 010</span></code><code><span>    public static final int PRIORITY_HIGH = 4;   // 100</span></code><code><span>    // Permission constants</span></code><code><span>    public static final int PERMISSION_READ = 1;   // 001</span></code><code><span>    public static final int PERMISSION_WRITE = 2;  // 010</span></code><code><span>    public static final int PERMISSION_DELETE = 4; // 100</span></code><code><span>    // Combined values</span></code><code><span>    public static final int PERMISSION_LOW_PRIORITY = PRIORITY_LOW | PERMISSION_READ; // 001</span></code><code><span>    public static final int PERMISSION_NORMAL_PRIORITY = PRIORITY_NORMAL | PERMISSION_READ | PERMISSION_WRITE; // 011</span></code><code><span>    public static final int PERMISSION_HIGH_PRIORITY = PRIORITY_HIGH | PERMISSION_READ | PERMISSION_WRITE | PERMISSION_DELETE; // 111</span></code><code><span>    public static boolean checkPermission(int permission, int priority) {</span></code><code><span>        return (permission & priority) == priority;</span></code><code><span>    }</span></code><code><span>}</span>

4.2.2 Bitmap

A bitmap uses a single bit to represent the existence of an element, achieving massive space savings. For example, 2 billion integers require only ~0.23 GB when stored as bits instead of ~7.45 GB as 4‑byte ints.

4.2.3 Bloom Filter

Bloom filters extend the bitmap idea with multiple hash functions, allowing probabilistic membership tests with a controllable false‑positive rate but zero false negatives. They are employed in web crawlers, spam filters, cache‑penetration protection, and storage engines like HBase and Cassandra.

4.3 Circular Queue

Circular queues model fixed‑size, wrap‑around buffers useful for streaming data, network protocols, and internal components of middleware such as Dubbo, Netty, Akka, Quartz, ZooKeeper, and Kafka. Two common variants are consistent hash rings and time wheels.

4.3.1 Consistent Hash Ring

By mapping both keys and nodes onto a virtual ring, a key is assigned to the first node encountered clockwise. Virtual nodes improve load balance and reduce data movement when physical nodes join or leave.

4.3.2 Time Wheel

A time wheel implements efficient delayed task scheduling. Each bucket represents a time slice (e.g., 1 second). Tasks are placed into the bucket corresponding to their expiration time; a single thread advances the wheel, executing all tasks in the current bucket. This yields O(1) add/remove complexity and low CPU overhead.

In summary, the article explains fundamental database concepts, surveys a wide range of modern database technologies, and demonstrates how classic data‑structure techniques—chain tables, bitwise flags, bitmap, Bloom filter, and circular queues—can be applied to solve real‑world software engineering problems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

NewSQL Data Structures Databases Hashing NoSQL bitwise operations

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.