Evolution of Architecture for Large-Scale Websites
The article outlines the key characteristics of large-scale websites and traces their architectural evolution from single‑server setups to multi‑tier, cache‑enhanced, clustered, and distributed systems, highlighting strategies such as load balancing, database read/write separation, CDN usage, NoSQL adoption, and service‑oriented decomposition.
1. Characteristics of Large Websites
High concurrency and massive page views (PV) requiring large traffic handling.
24/7 high availability.
Massive data storage and management across many servers.
Globally distributed users with diverse network conditions.
Harsh security environment with frequent hacking attempts.
Rapid requirement changes and frequent releases.
Gradual, progressive development and operation.
2. Evolution of Large‑Website Architecture
Initial single‑server stage : All resources (application, database, files) reside on one server, typical of LAMP‑based PHP sites.
Separation of application and data services : Application server, file server, and database server become three distinct machines, each optimized for its workload (CPU‑intensive app server, fast‑disk and large‑memory DB server, storage‑heavy file server).
Cache introduction : Local and remote (distributed) caches are added to reduce database load and improve response speed; remote caches can scale horizontally.
Application server clustering : Multiple app servers behind a load balancer distribute requests, solving high‑concurrency and data‑volume challenges and providing scalability.
Database read/write separation : Primary (write) and replica (read) databases are used; further horizontal/vertical sharding splits large tables to alleviate bottlenecks.
Reverse proxy and CDN acceleration : Both act as caches; CDNs are deployed at ISP data centers while reverse proxies sit in the core data center, delivering content faster and offloading backend servers.
Distributed file system and distributed databases : As file volume grows, a single file server is insufficient; distributed storage systems provide the needed capacity and reliability.
NoSQL and search engine integration : Adopted for better scalability of distributed workloads; a unified data‑access layer abstracts multiple data sources.
Business splitting : Monolithic applications are divided into independent services (e.g., news, images, search) that communicate via messaging or shared databases.
Distributed services : Core services such as user, order, payment, and security are extracted and built as reusable micro‑services using a service‑oriented framework.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.