How to Refactor a Live Backend System Without Stopping the Business
The article shares practical experiences and lessons learned from three large‑scale backend architecture refactorings, explaining why doing business‑critical changes while keeping services online is the toughest challenge and how to identify and solve the most critical problems first.
Preface
For a programmer, what is the most painful thing? Some say changing requirements, unreadable code, elusive bugs, or even finding a partner, but the author argues the real pain is "architectural refactoring while the business keeps running" – likened to swapping a Ferrari's engine while it races.
The three key constraints are: the business must never stop, the system must remain functional, and the solution must replace the old engine rather than merely patch it.
The author, after joining UC, has led three distinct system refactorings and will share the experience.
Targeted Approach
M System : a backend managing game data, suffering from tight coupling between shared game data and business‑specific data, leading to poor scalability. The goal was to separate the two data domains.
Result: post‑refactor, version releases increased fourfold.
S System : core game‑access system with a single‑point‑of‑failure primary database. The goal was to achieve dual‑center deployment so any data‑center failure would be fully covered by the other.
Result: availability rose from three nines to four nines, and major incidents no longer impacted the business.
X System : an innovative business platform where all features were crammed into one monolith, causing scalability and reliability issues.
The problem was not data coupling but functional overload in a single system, making any single issue affect the entire site.
Solution: split functions into separate subsystems, reducing complexity and improving development speed.
After refactoring, inter‑system interfaces increased, but overall development speed and simplicity improved, and failures became isolated.
Identifying the real problems to solve through refactoring is crucial; trying to fix everything at once leads to wasted effort and burnout, especially for new architects.
Post‑refactor, targeted optimizations can be performed quickly within the team without involving many external stakeholders, dramatically increasing efficiency.
Source: 云栖社区 Original article: https://yq.aliyun.com/articles/42321
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
