Practical Guide to Application and Database Splitting for Scalable Systems
This article outlines why and how to split monolithic applications and databases, covering preparation, boundary definition, migration steps, DB sharding strategies, consistency guarantees, and operational safeguards to ensure a smooth transition to a more modular, scalable architecture.
1 Why Split?
The article starts with a dialogue illustrating five main reasons for splitting: severe inter‑application coupling, poor business extensibility, legacy code that is hard to maintain, limited system scalability, and a vicious cycle of accumulating technical debt.
2 Preparation Before Splitting
2.1 Understanding Business Complexity
It stresses multi‑dimensional analysis of business complexity, involving technical discussions with product and development teams, and practical exploration of existing domain models, code, and architecture.
2.2 Defining Boundaries – High Cohesion, Low Coupling, Single Responsibility
Good service boundaries are illustrated with the “Huluwa brothers” analogy, emphasizing independent responsibilities while allowing eventual integration.
2.3 Setting Post‑Split Application Goals
Clear, incremental goals (e.g., separating DB and application first, redesigning data models later) prevent endless deepening of the split.
2.4 Assessing Current Architecture, Code, and Dependencies
Identifying hidden complexities before work begins reduces later troubleshooting costs.
2.5 Preparing a “pocket‑guide”
Encourages keeping concise checklists for alternative solutions, problem decomposition, and contingency plans.
2.6 Relax and Start
Motivational reminder to stay calm before execution.
3 Practice
3.1 Database Splitting
Explains vertical (table‑level) and horizontal (sharding) splitting, the need for a global ID generator, and migration steps including new table creation, full data sync, binlog incremental sync, and handling of primary‑key conflicts.
3.1.1 Global ID Generation
Discusses Snowflake, MySQL auto‑increment tables, dual‑table approaches, and Alibaba’s tddl‑sequence, noting non‑monotonic IDs and required query adjustments.
3.1.2 New Table Creation & Data Migration
Recommends utf8mb4 charset, careful index planning, full‑load tools, low‑traffic windows, and binlog‑based incremental sync (e.g., Canal, Otter).
3.1.3 Refactoring Cross‑Database Joins
Provides four strategies: business avoidance, global tables, redundant fields, and in‑memory composition (RPC or local cache).
3.1.4 Cut‑over Schemes
Two approaches are presented: a) DB‑stop‑write (fast, low cost, high rollback risk) and b) dual‑write (safer, longer, higher latency).
3.1.5 Switch Management
Emphasizes initializing feature switches to null to avoid stale defaults during restarts.
3.2 Ensuring Consistency After Split
Lists options: distributed transactions (poor performance), message‑based compensation, and scheduled‑task compensation for eventual consistency.
3.3 Maintaining Stability
Three pillars: distrust third‑party services (defensive coding, timeouts, async fallback), protect consumers (minimal, well‑designed APIs, rate limiting), and improve the service itself (single responsibility, cleaning legacy bugs, SOP‑driven ops, resource predictability).
4 Summary
Key takeaways: prepare for pressure, decompose complex problems into testable steps, adopt SOPs for rapid failure handling, and continuously refine the system through disciplined engineering practices.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.