Databases 34 min read

Comprehensive Overview of Database Evolution, Architecture, and Core Technologies

This article provides a detailed survey of database fundamentals, historical evolution from early relational systems to modern distributed and cloud‑native solutions, explains various architectural components and core technologies such as query processing, indexing, transaction management, and discusses business‑driven classifications, distributed designs, SMP versus MPP, and resource‑access models.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Comprehensive Overview of Database Evolution, Architecture, and Core Technologies

Overview

A database is a structured repository that stores, manages, and shares large volumes of data, providing data independence so applications are insulated from changes in logical or physical data organization.

Evolution of Databases

Over the past 40 years, databases have progressed through several stages: the 1980s‑1990s saw the rise of commercial RDBMSs like Oracle, IBM DB2, Sybase, SQL Server and Informix; the 1990s‑2000s introduced open‑source systems (PostgreSQL, MySQL) and analytical databases (Teradata, Greenplum) for OLAP workloads; the 2000‑2010 period brought NoSQL innovations (Google File System, Bigtable, MapReduce) and document/kv stores such as MongoDB and Redis; after 2010, cloud‑native distributed databases like Aurora, Redshift, Azure SQL Database and Google Spanner emerged, offering HTAP capabilities.

Database Development – Business Perspective

Databases can be classified into classic relational OLTP systems (e.g., banking transactions, e‑commerce orders), NoSQL and specialized stores for semi‑structured data (documents, graphs, time‑series) that favor horizontal scalability over strong consistency, and analytical OLAP systems designed for massive, complex data analysis.

Beyond core engines, auxiliary services such as data transfer, backup, and management tools are essential, and a database control platform is required for provisioning, monitoring, and resource allocation across private clouds, public clouds, or on‑premise data centers.

Technical Evolution of Databases

Traditional centralized databases have evolved into distributed architectures to meet the demands of massive data volumes, real‑time OLTP/OLAP convergence, and higher availability.

1. Traditional Centralized Database Architecture

The core modules include application interface, SQL interface, query execution engine (planner, optimizer, executor), data access layer (transaction, memory, security, file/index management), and storage engine (data files, index files, metadata). The execution flow follows a Volcano‑style iterator model where each operator implements open‑next‑close.

Key components:

JDBC component for Java‑DB connectivity.

Session management handling client sessions and permissions.

Permission management using caching for system and object privileges.

SQL parser built with tools like Flex/Bison, ANTLR, or JavaCC to generate an AST.

Query optimizer (rule‑based RBO and cost‑based CBO) that selects the lowest‑cost execution plan.

Physical executor (Volcano model) that streams rows through operator pipelines.

Index component (B‑tree, hash, inverted, bitmap, etc.) that trades space for lookup speed.

Transaction component implementing ACID properties with undo logs, redo logs, locking, timestamps, and MVCC.

Storage engine handling concurrency, transaction support, referential integrity, physical storage formats, indexing strategies, memory buffers, performance aids, and other features.

2. Distributed Database Architecture

Distributed databases consist of multiple management and data nodes, presenting a logically unified but physically partitioned system. They support data distribution methods (hash, range, list), single‑table replication for join acceleration, and various implementation paths: open‑source DB + middleware or fully self‑developed solutions.

Typical workloads include OLTP, OLAP, and HTAP. Core distributed techniques cover data sharding, two‑phase commit or one‑phase commit with compensation for distributed transactions, multi‑version concurrency control, and replication strategies that require majority‑acknowledged writes for consistency and availability.

3. SMP vs. MPP

SMP (Symmetric Multi‑Processor) systems share CPU, memory, and I/O resources, limiting scalability due to contention, especially on memory buses. MPP (Massive Parallel Processing) connects many SMP nodes via a network, each with private resources, achieving better scalability for workloads with low inter‑node communication.

4. Resource Access Models

Three models are described:

Shared Everything – all nodes share CPU, memory, and I/O (e.g., SQL Server).

Shared Disk – private CPU/memory but shared storage (e.g., Oracle RAC).

Shared Nothing – each node has its own CPU, memory, and disk, typical of MPP and sharding architectures.

Conclusion

While big‑data platforms like Hadoop provide storage, computation, and scheduling, traditional databases remain indispensable for transactional consistency, high‑performance OLTP/OLAP, and enterprise‑critical applications. The article will continue with deeper dives into each technology.

References

1. https://baike.baidu.com/item/分布式数据库系统 2. MySQL multi‑storage‑engine architecture 3. MySQL architecture and optimization principles 4. Comprehensive guide to distributed database design 5. Li Feifei: Perspectives on the future of databases 6. Distributed Systems: Concepts and Design (3rd Edition)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

architectureSQLtransactiondatabaseHTAPNoSQL
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.