How We Rebuilt a 15‑Year‑Old Review Platform: From Monolithic Code to a Scalable DDD‑Driven Architecture
This article details the complete redesign of a fifteen‑year‑old e‑commerce review system, covering its legacy pain points, the strategic choice of a full‑stack reconstruction using Domain‑Driven Design, the new layered micro‑service architecture, data migration tactics, operational challenges, organizational safeguards, and the measurable performance gains achieved after launch.
Background
The review subsystem of a large e‑commerce platform collects text, images and videos after a transaction and serves them on product detail pages. It handles billions of requests per day, making it a high‑concurrency, high‑availability service.
Legacy Pain Points
Codebase >1 M LOC, tangled procedural logic, no clear layering; single classes contain tens of thousands of lines.
Low delivery efficiency – each change required weeks of impact analysis and QA effort.
Stability issues – >100 JSF interfaces, missing idempotency, circular service dependencies caused frequent outages and high on‑call load.
Re‑architecture Objectives
Support new business lines (stores, couriers, cars, pharmacists) with a generic domain model.
Reduce coupling and improve code maintainability.
Eliminate single points of failure, add idempotent APIs, and lower operational cost.
Strategic Design
We adopted Domain‑Driven Design (DDD) and performed event‑storming workshops to identify business scenarios and bounded contexts. The review domain was split into seven sub‑domains (creator, production, disposal, recommendation, consumption, conversion, revenue); the five core sub‑domains (creator, production, disposal, consumption, revenue) became the focus of reconstruction.
Micro‑service Layering
BFF layer : HTTP façade for PC, App and Mini‑Program front‑ends.
Application layer : Input validation, context assembly and orchestration.
Domain layer : Core business logic encapsulated in domain services and entities.
Infrastructure layer : Persistence, external gateways and anti‑corruption wrappers.
Package & Domain Separation
Within each layer we introduced fine‑grained packages that follow domain boundaries, achieving high cohesion and low coupling.
Tactical Design
Application Architecture – Cloud Ladder (LPD) Framework
The Cloud Ladder framework enforces the LPD principle (Layer, Package, Domain). It draws inspiration from the open‑source COLA framework but removes unnecessary abstractions. POJOs are simplified, and strict layer dependencies are enforced through compile‑time checks.
Data Storage Architecture
Online storage (CQRS) : Write path stores primary data in JED (a MySQL‑compatible store). Read side uses JIMDB for high‑QPS traffic and Elasticsearch for lower‑QPS scenarios. Data is projected into view models by Flink/MQ pipelines.
Offline storage : Dozens of legacy FDM tables were consolidated into a unified GDM model. The GDM schema contains a primary review table with JSON‑encoded extension fields, a voucher table for qualification, and auxiliary tables for audit and tagging. This reduces table count from >100 to <10 and eliminates redundant columns.
Data Migration & Consistency
Bulk migration: Export historic data from JED to Hive, compute diffs with Spark jobs, and re‑import corrected rows.
Incremental sync: Deploy Binlog listeners that replay changes to the new stores and perform real‑time cross‑system validation.
We built a diff‑reduction pipeline that collects metrics, clusters diff cases, prioritises storage‑layer fixes, and iterates until the diff rate falls below the defined threshold (≤0.1%).
Business‑Logic Consistency
Traffic replay was used to compare old and new APIs on a 0.1 % sample. Detected mismatches were grouped by root cause (storage vs code). Over 80 % of mismatches originated from storage inconsistencies, so we focused remediation on the data layer. After successive fixes the interface‑level consistency reached >99 % and key business metrics (review count, positive‑rate) were within acceptable variance.
Organizational Guarantees
A dedicated reconstruction team was formed with the following roles:
Project lead (overall responsibility)
Architects (technical selection, design review)
Core developers (code migration)
QA engineers (automated diff verification)
SRE (deployment, monitoring)
Product owners (business alignment)
PMO (schedule, risk register)
Governance included daily stand‑ups, weekly cross‑team syncs, monthly retrospectives and a transparent communication channel.
Key Principles (Five “Must‑Do” Rules)
Control technical complexity by separating data‑layer and code‑layer milestones.
Make milestones verifiable and deliverable.
Invest heavily in upfront domain analysis and modeling.
Assign domain and functional owners to solidify manpower allocation.
Conduct lightweight, regular retrospectives to adapt quickly.
Benefits
Throughput increased by ~2.5×.
Code size reduced by >80 % (from >1 M LOC to ~180 K LOC).
System remained stable during the 2025 Double‑11 peak (no major incidents).
Deployment time dropped from ~5 hours to <30 minutes per service.
New review types can be added with minimal code changes.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
