Technical Summary of Large‑Scale Distributed Website Architecture
This article provides a comprehensive technical summary of large‑scale distributed website architecture, outlining characteristics, architectural goals, patterns, and detailed strategies for high performance, high availability, scalability, extensibility, security, and agility, supplemented with illustrative examples and practical insights.
This article is a technical summary for learning large‑scale distributed website architecture. It gives an overview of a high‑performance, highly‑available, scalable, and extensible distributed site and offers a reference architecture, combining reading notes and personal experience.
Outline
Characteristics of large websites
Architectural goals for large websites
Architecture patterns for large websites
High‑performance architecture
High‑availability architecture
Scalable architecture
Extensible architecture
Security architecture
Agile architecture
Examples of large‑scale architectures
1. Characteristics of Large Websites
Massive user base, geographically distributed
High traffic and concurrency
Huge data volume, requiring high availability
Harsh security environment, prone to network attacks
Rich functionality, rapid changes, frequent releases
Growth from small to large, incremental development
User‑centric design
Free services with paid experiences
2. Architectural Goals
High performance: fast user experience
High availability: continuous service access
Scalability: ability to add or remove hardware to adjust capacity
Security: secure access, data encryption, safe storage
Extensibility: easy addition/removal of features or modules
Agility: rapid response to changing business needs
3. Architecture Patterns
Layered: application, service, data, management, analytics layers
Segmentation: divide by business/module/function (e.g., homepage, user center)
Distributed deployment across multiple physical machines
Cluster: multiple instances of a component behind load balancers
Cache: place data close to the application or user to speed access
Asynchronous: decouple request and response using notification or polling
Redundancy: replicate components to improve availability, security, performance
Security: solutions for known issues and mechanisms for unknown threats
Automation: use tools to replace manual repetitive tasks
Agility: embrace requirement changes and respond quickly
4. High‑Performance Architecture
Focus on user‑centric fast page access, aiming for short response time, high concurrency, high throughput, and stable performance. It can be divided into front‑end optimization, application‑layer optimization, code‑level optimization, and storage‑layer optimization.
Front‑end optimization includes reducing HTTP requests, leveraging browser cache, enabling compression, proper placement of CSS/JS, asynchronous JS, and minimizing cookie transmission, as well as using CDN acceleration and reverse proxies.
Application‑layer optimization involves caching, asynchronous processing, and clustering.
Code optimization covers proper architecture, multithreading, resource reuse (object pools, thread pools), good data structures, JVM tuning, singleton patterns, and caches.
Storage optimization includes caching, SSDs, fiber transmission, read/write tuning, disk redundancy, distributed storage (HDFS), NoSQL, etc.
5. High‑Availability Architecture
Large sites must remain accessible at all times. Due to complexity, distribution, cheap servers, open‑source databases, and OS diversity, achieving high availability is challenging; failures are inevitable.
Improving availability starts at the architectural planning stage, often expressed as “nines” (e.g., 99.99% uptime allows about 53 minutes of downtime per year). Strategies differ by layer, typically using redundancy and failover.
Application layer: design stateless services and use load balancing (with session synchronization if needed).
Service layer: load balancing, hierarchical management, fast failure (timeouts), asynchronous calls, service degradation, idempotent design.
Data layer: redundant backups (cold, hot, warm), failover, and CAP theorem considerations (consistency, availability, partition tolerance).
6. Scalable Architecture
Scalability means adjusting processing capacity by adding or removing hardware without redesigning the architecture.
Application layer: vertical or horizontal partitioning, load balancing via DNS, HTTP reverse proxy, IP, or link‑layer methods.
Service layer: similar to application layer.
Data layer: sharding, partitioned tables, NoSQL, common algorithms such as hash and consistent hash.
7. Extensible Architecture
Allows easy addition or removal of functional modules, providing good code‑level extensibility.
Techniques include modularization, componentization (high cohesion, low coupling), stable interfaces, design patterns, message queues for decoupling, and service‑oriented distributed modules.
8. Security Architecture
Provide effective solutions for known issues and mechanisms to detect and defend against unknown threats. Security measures span infrastructure, application, and data confidentiality, including password policies, regular scans, firewalls, DDoS protection, intrusion detection, and network segmentation.
Application security: prevent XSS, injection, CSRF, information leakage, insecure file uploads, path traversal, and use Web Application Firewalls.
Data confidentiality: secure storage, encrypted backups, access controls, and secure transmission using algorithms such as MD5, SHA, DES, 3DES, RC, and RSA.
9. Agility
Architecture and operations must adapt to change, offering high scalability and extensibility to meet rapid business growth and traffic spikes, supported by agile management and development practices.
10. Example of a Large‑Scale Architecture
The example adopts a seven‑layer logical architecture: client layer, front‑end optimization layer, application layer, service layer, data storage layer, big‑data storage layer, and big‑data processing layer.
Client layer supports PC browsers and mobile apps, with mobile apps accessing via IP and reverse proxy.
Front‑end layer uses DNS load balancing, CDN acceleration, and reverse proxy services.
Application layer consists of clustered web applications, vertically split by business (e.g., product, user center).
Service layer provides common services such as user, order, and payment services.
Data layer includes relational DB clusters (with read/write separation), NoSQL clusters, distributed file system clusters, and distributed caches.
Big‑data storage layer collects logs and structured/unstructured data from application and service layers.
Big‑data processing layer performs offline analysis via MapReduce and real‑time analysis via Storm, storing results in relational databases for downstream use.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
