Evolution of a Simple MVP Monolithic Architecture to a Complex Distributed System: A Taobao Case Study
This article uses a simulated Taobao example to illustrate how a simple MVP monolithic architecture evolves through ten stages—from separating Tomcat and database to adopting caching, load balancing, database sharding, microservices, ESB, containerization, and cloud platforms—highlighting the technologies and design principles involved in each transition.
This article presents a simulated case study of Taobao to demonstrate how a simple MVP‑style monolithic architecture gradually transforms into a complex distributed system as traffic grows from hundreds to millions of concurrent users. Each evolutionary stage is described along with the associated technologies and design considerations.
Basic concepts
Distributed: multiple modules deployed on different servers (e.g., Tomcat and database on separate machines).
High availability: the system continues to serve requests when some nodes fail.
Cluster: a group of servers providing a unified service, with automatic failover and load sharing.
Load balancing: evenly distributing incoming requests across multiple nodes.
Forward and reverse proxy: forward proxy handles outbound traffic from inside the network, while reverse proxy forwards inbound traffic to internal servers.
Architecture evolution steps
Separate Tomcat and database : Deploy Tomcat and the database on different machines, eliminating resource contention.
Introduce local and distributed cache : Use memcached for local caching and Redis for distributed caching to offload most read traffic from the database.
Add reverse‑proxy load balancing : Deploy Nginx or HAProxy to distribute requests across multiple Tomcat instances, dramatically increasing concurrent capacity.
Database read/write separation : Split the database into a write master and multiple read replicas, using middleware such as MyCAT to synchronize data.
Business‑based database sharding : Allocate different business domains to separate databases, reducing contention and allowing independent scaling.
Split large tables into smaller tables : Hash‑based routing or time‑based partitioning (e.g., hourly tables) to achieve horizontal scaling; MyCAT can manage routing and access.
Use LVS or F5 for Nginx load balancing : Layer‑4 load balancers (LVS software or F5 hardware) provide higher throughput and protocol flexibility than Nginx.
DNS round‑robin across data centers : Configure DNS to return multiple IPs, enabling traffic distribution at the data‑center level.
Introduce NoSQL and search engines : Adopt HBase, MongoDB, Elasticsearch, Kylin, Druid, etc., for massive data storage, full‑text search, and analytical workloads.
Split a large application into smaller applications : Separate business modules into independent services, using a distributed configuration center (e.g., ZooKeeper) for shared settings.
Extract common functions into microservices : Isolate user management, order, payment, authentication, etc., into independent services accessed via HTTP, TCP, or RPC; frameworks such as Dubbo or Spring Cloud provide governance.
Introduce an Enterprise Service Bus (ESB) : Use ESB to unify protocol conversion and reduce coupling, representing a SOA approach.
Containerization : Package services into Docker images and orchestrate them with Kubernetes for dynamic deployment and scaling.
Move to a cloud platform : Deploy to public‑cloud IaaS/PaaS/SaaS, leveraging elastic resources, managed services, and cost‑effective scaling.
Design summary
The evolution path is not mandatory; real‑world systems may address multiple bottlenecks simultaneously or follow a different order based on business priorities. Architecture depth should match performance requirements while leaving room for future expansion. Service‑side architecture focuses on application organization, whereas big‑data architecture provides the underlying storage, processing, and analytics capabilities.
Key architectural principles (12 items)
N+1 design – no single point of failure.
Rollback capability – ensure forward compatibility and easy version rollback.
Feature toggle – configurable enable/disable for rapid fault isolation.
Monitoring – design monitoring from the start.
Multi‑active data‑center – high availability across locations.
Use mature technologies – avoid untested or unsupported solutions.
Resource isolation – prevent a single business from monopolizing resources.
Horizontal scalability – architecture must support scaling out.
Buy non‑core components – leverage commercial products for non‑core functions.
Commercial hardware – reduces hardware failure risk.
Rapid iteration – develop small features quickly for early validation.
Stateless design – service interfaces should not rely on previous request state.
Author
Shi Hua – experienced in big‑data technologies, architecture design, high‑concurrency and distributed systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
