Databases 21 min read

Choosing the Right Database Architecture: A Practical Guide

The article walks through the challenges of growing data volumes, compares master‑slave, vertical, horizontal, and sharding‑plus‑partitioning architectures, explains their pros and cons, and provides concrete guidance on selecting and implementing a suitable database solution, including middleware options and real‑world case studies.

Architect's Journey

Feb 9, 2022

Choosing the Right Database Architecture: A Practical Guide

As business and data volumes increase, databases face uncontrolled growth that strains disk space, I/O, and system overhead, eventually hitting performance limits on a single server. Applications therefore demand higher robustness, security, and scalability.

Typical Database Challenges

I/O bottleneck : Large numbers of reads and writes exceed the I/O capacity of a single instance.

Storage bottleneck : Massive data sets make the original logical schema inefficient, causing performance degradation.

Availability : Critical services must stay online and recover quickly from failures.

Security : Data must be protected against loss or accidental deletion.

Architectural Options

1. Master‑Slave (Read‑Write Separation) – Writes go to the master, reads are served by multiple replicas. It improves availability and safety because a replica can be promoted when the master fails.

Advantages

Simple structure, mature technology, widely supported.

Low entry barrier for existing applications.

Multiple replicas increase data safety.

Fast failover to a replica when the master crashes.

Disadvantages

Cannot scale writes horizontally; the master becomes a bottleneck.

Replicas store full copies, wasting storage.

Potential data‑consistency and replication‑lag issues.

2. Vertical Partitioning (Business‑Level Sharding) – Hot business modules are split into dedicated databases to isolate intensive I/O from other workloads.

Advantages

Clear separation of services, aligns with micro‑service decomposition.

Cold and hot data are isolated, preventing interference.

Each database can be maintained and iterated independently.

Disadvantages

Cross‑database joins are impossible; queries must be performed in the application layer or with redundant data.

Distributed transaction handling becomes necessary.

3. Horizontal Sharding (Data‑Level Partitioning) – Large tables are split into multiple tables based on a sharding rule, but all tables remain in the same logical database.

Advantages

Reduces per‑table size, improving query performance.

No cross‑database transaction issues because all shards share the same DB instance.

Disadvantages

Cross‑shard queries require additional logic.

Adding shards entails data migration.

I/O limits still exist for extremely high request volumes.

4. Combined Database‑and‑Table Partitioning (分库分表) – When both storage and I/O become bottlenecks, data is split across multiple servers and tables, effectively alleviating pressure on single‑machine resources.

Advantages

Reduces per‑table size and spreads load across many servers.

Disadvantages

Cross‑shard queries, data migration during scaling, broadcast‑table handling, and distributed transaction support are required.

Sharding Strategies

Range Sharding – Split data by a sorted key (commonly time). Example: one table for IDs 0‑10000, another for 10001‑20000. Benefits include natural horizontal scaling and fast range queries; drawbacks are hotspot creation on recent time ranges.

Hash Sharding – Apply a hash (e.g., user ID % N) to distribute rows evenly, reducing hotspots but making range queries impossible and requiring full‑shard scans for non‑hashed predicates.

Key Selection Considerations

Distributed Transactions : XA or application‑level solutions (TCC, SAGA) incur high overhead.

Multi‑DB Result Merging : Group‑by, order‑by, and pagination across shards need data‑cleaning or synchronization tools (e.g., TiDB, Kudu).

Data Latency : Replication lag in master‑slave and aggregation lag after horizontal sharding.

Cross‑Shard Joins : Not possible directly; workarounds include index tables, “gene” method, or wide tables in NoSQL.

Shard Expansion : Adding shards requires data migration; cost grows with data volume.

ID Generation : Options include UUID, Redis INCR, Snowflake (64‑bit structure), and Meituan Leaf (ZK‑based).

Middleware Classification

Application‑Layer (JDBC) Middleware – Re‑implements JDBC interfaces (DataSource, PreparedStatement) to provide transparent sharding. Examples: TDDL, sharding‑jdbc, TSharding. Pros: No extra deployment, works with any JDBC‑compatible DB. Cons: Language‑specific (Java only) and higher connection‑pool usage.

Proxy‑Layer Middleware – Sits between the client and the database, speaking the native MySQL protocol. Any language that can speak MySQL can use it. Examples: Amoeba, Cobar, Mycat. Pros: Language‑agnostic, lower connection count due to shared pool. Cons: Limited to supported DBs, requires a separate deployment, adds an extra network hop.

Deployment Options

JDBC Scheme : Decentralized, suitable for high‑performance OLTP Java services.

Proxy Scheme : Central entry point, supports heterogeneous languages, fits OLAP and operational scenarios.

Hybrid Scheme : Combines both for large systems with mixed front‑end and back‑end workloads.

ShardingSphere Ecosystem

ShardingSphere offers three independent products: Sharding‑JDBC , Sharding‑Proxy , and the planned Sharding‑Sidecar . Core capabilities include data sharding, database‑and‑table partitioning, read‑write separation, customizable sharding strategies, distributed primary‑key generation, distributed transactions (XA, flexible), and standardized transaction APIs.

Case Study: E‑commerce SaaS

User Database – High‑traffic enterprise users are isolated in separate databases; slow‑growing user data does not require table partitioning.

Order Database – Fast‑growing order data is both sharded by enterprise ID (database) and by user ID (table) to balance load.

Attachment Database – Low concurrency, only table partitioning is needed because a single database can handle the I/O.

Practical Problems Encountered

Distributed Transactions : Complex and costly; will be covered in a dedicated article.

Distributed ID Generation : UUID (simple but long), Redis INCR (requires persistence), Snowflake (64‑bit structure), Meituan Leaf (ZK‑based).

Cross‑Shard Queries : Index‑table method, gene method (embedding part of the user ID into the enterprise ID), wide‑table method (NoSQL wide table) – each with trade‑offs.

Shard Expansion : Scaling from 1 billion to 50 billion rows may require moving data from 64 to 128 tables, incurring massive migration cost.

Consistent Hashing : Places servers on a 0‑2ⁿ ring; only neighboring nodes are affected by addition/removal, reducing data movement during scaling.

Summary

Database architectures include master‑slave, vertical, and horizontal designs, each addressing high availability, intensive I/O, or massive storage needs.

Every architecture introduces specific challenges such as data latency, distributed transactions, cross‑shard queries, and scaling costs.

Complex solutions are not always better; choose the architecture that matches the current stage and pain points of the application.

Middleware falls into JDBC‑based (high performance, Java‑only) and Proxy‑based (language‑agnostic, lower connection usage) categories.

ShardingSphere provides a powerful, open‑source suite (JDBC, Proxy, future Sidecar) to implement sharding, read‑write separation, and distributed transactions.

Real‑world deployments must address distributed transactions, ID generation, cross‑shard queries, and shard expansion.

scalability sharding Middleware Database Architecture ShardingSphere distributed transactions