Fundamentals and Evolution of Large-Scale Website Architecture Design
This article explains the essence of software architecture as a process of reducing system entropy through splitting and merging, outlines the capabilities required of architects, and details the step‑by‑step evolution of large‑scale website infrastructures including caching, CDN, database sharding, and messaging systems.
Large‑scale website architecture involves many aspects and is far more complex than a simple site; understanding its design correctly is essential.
The essence of architecture is to reorganize a system to reduce entropy, turning disorder into order, much like biological systems create negative entropy through interaction with the environment.
Architectural transformation relies on splitting a system into subsystems or components and then recombining them, making the initial decomposition the more challenging part.
Key capabilities for architects include strong abstraction (deduplication and reuse), classification (decoupling objects, services, and modules), and algorithmic performance optimization affecting CPU, memory, I/O, and network.
Evolution of Architecture
1. Physical separation of web servers and databases
2. Introduction of page caching
3. Fragment caching
4. Data caching
5. Web server clustering
6. Database sharding (primary consideration)
7. Table sharding, DAL, and distributed caching
8. Adding more web servers
9. Read/write separation and cheap storage solutions
10. Transition to large distributed applications and cheap server farms
The knowledge system includes horizontal layering (application, service, data), vertical splitting (functions and services), and distributed components such as services, static resources, data, and storage.
Key infrastructure elements: clustering for concurrency and availability, caching (local, distributed, CDN), asynchronous processing to reduce coupling, redundancy (cold/hot standby), automation (deployment, testing, monitoring, failover), and security.
Distributed Caching
In high‑concurrency environments, adding a cache layer before the database reduces load and improves response time; distributed caches avoid the limitations of single‑node memory and prevent data duplication across nodes.
Common technologies: Memcached, Redis, with concepts like consistent hashing, distributed sessions, data replication, and automatic failover.
Content Delivery Network (CDN)
CDN distributes content via a network of edge nodes, directing user requests to the nearest node based on load, latency, and proximity, thereby accelerating access, reducing bandwidth, and providing resilience against attacks.
Features include local cache acceleration, cross‑operator mirroring, remote acceleration via DNS load balancing, bandwidth optimization, and cluster‑based DDoS mitigation.
Persistent Storage
Traditional IOE solutions (IBM mainframes, Oracle, EMC) are costly; modern approaches consider cheaper storage options and distributed databases.
Database Sharding
Start with database sharding; if queries remain slow, proceed to table sharding. Distributed databases aggregate data across nodes, while distributed file systems handle large‑scale growth. Read/write separation and horizontal scaling become necessary as business expands.
Examples like Taobao illustrate vertical business sharding (transactions, users, products, shops) and horizontal scaling using tools such as TDDL for routing without impacting applications.
Message Systems
Message queues (ActiveMQ, RabbitMQ, ZeroMQ, Kafka, RocketMQ) decouple applications, enable asynchronous processing, handle traffic spikes, and ensure high performance, availability, scalability, and eventual consistency.
Typical scenarios include asynchronous tasks, application decoupling, traffic shaping, and inter‑service communication.
Beyond these, comprehensive operations design must address distributed parallel computing, monitoring, security, service management, storage, business partitioning, and data center disaster recovery to successfully run a large website.
-END-
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
