Key Considerations for Large‑Scale System Refactoring: Lessons from Dada JD.com’s Double‑11 Experience
The article shares practical insights on planning, designing, developing, testing, and rolling out a large‑scale backend system refactor, emphasizing resource limits, pain points of legacy code, unit‑test protection, layering, decoupling, monitoring, and staged deployment to ensure stability during high‑traffic events.
Author Bio: Wei Zhibo, a Carnegie Mellon graduate with multiple entrepreneurial experiences and six‑seven years of Silicon Valley work, now leads business architecture and backend refactoring at Dada JD.com.
Large‑scale events such as Double‑11 serve as stress tests for internal systems; the author, having participated in multiple billion‑visit‑per‑day system rebuilds, uses this occasion to discuss the key factors that determine the success of a refactoring project.
Design Phase – Resource Constraints and Legacy Pain Points The refactor must be planned realistically given limited time, manpower, and budget. Over‑ambitious schedules often cause burnout. Understanding the legacy system’s shortcomings—tight coupling, complex business logic, and poor scalability under traffic spikes—is essential to set clear goals for maintainability, extensibility, and performance.
Development Phase – Unit‑Test Shield and Layered Decoupling The baseline for a successful refactor is that the new system supports all core functionalities of the old one. Unit tests act as a protective scaffold, ensuring that critical logic remains intact throughout the rewrite. Additionally, establishing clear layers and avoiding strong coupling (e.g., using interfaces, data‑transfer objects) allows engineers to work independently and maintain productivity.
Testing & Release Phase – Monitoring and Controlled Rollout After coding, comprehensive monitoring at both macro and micro levels is required to quickly detect anomalies. A two‑stage rollout—first an "invisible mode" where requests are mirrored to the new system for comparison, then a "gray‑scale mode" gradually shifting traffic—helps minimize user impact and surface performance issues early.
In conclusion, while refactoring does not deliver immediate functional benefits, it dramatically improves long‑term maintainability and provides a robust foundation for future business growth.
Dada Group Technology
Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.