Understanding How Relational Databases Work: Architecture, Query Processing, and Transaction Management
This article explains the inner workings of relational databases, covering their core components, query optimization flow, indexing, merge sort, transaction handling, concurrency control, and lock management, while illustrating concepts with examples and diagrams.
When people think of relational databases they often feel something is missing; this article dives deep into the principles that have kept relational databases relevant for over 40 years, focusing on how they process SQL queries.
Basic concepts review
It starts with a reminder of algorithmic complexity, comparing O(1), O(log n), O(n) and O(n²) with concrete examples, showing how operation counts grow with data size.
Merge sort
The article introduces merge sort as a key algorithm for ordering data in a database and explains the merging of two sorted halves into a fully sorted set.
Overall architecture
The core components of a database are listed:
Process manager (thread pool management)
Network manager (handling I/O, especially for distributed systems)
File system manager (disk I/O bottleneck)
Memory manager (large‑scale data and query handling)
Security manager (authentication and authorization)
Client manager (client connection handling)
Database tools
Backup manager
Recovery manager
Monitor manager
Administrator manager (metadata, tablespaces, data pump)
Query manager
Data manager (transaction, cache, data access)
Client manager
Manages communication with clients (servers or terminals) via APIs such as JDBC, ODBC, OLE‑DB, handling authentication, load checking, request routing, buffering, and error handling.
Query manager
Processes a query through several stages: validation, rewrite (pre‑optimization), optimization, compilation, and execution.
Query rewrite rules
View merging
Sub‑query flattening
Removing unnecessary operators (e.g., DISTINCT with UNIQUE)
Eliminating redundant JOINs
Partition pruning
Custom rewrite rules
Data manager
Handles transaction models, buffering, and caching. It explains that disk I/O is the main bottleneck, so modern DBMS use a buffer pool with LRU (or LRU‑K) replacement to keep frequently accessed pages in memory.
Transaction manager and ACID
Describes the four ACID properties—Atomicity, Consistency, Isolation, Durability—and why they are essential for reliable data modifications.
Concurrency control and locking
Explains exclusive (write) locks and shared (read) locks, how they prevent conflicts, and how deadlocks can arise when two transactions wait for each other. It outlines the two‑phase locking protocol used by systems like DB2 and SQL Server to avoid such problems.
Conclusion
The author encourages developers to understand the underlying mechanisms of relational databases, noting that despite the popularity of NoSQL, many fundamental concepts remain best learned through relational systems.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.