Operations 12 min read

Technical Summary of Large‑Scale Distributed Website Architecture

This article provides a comprehensive technical summary of large‑scale distributed website architecture, outlining characteristics, architectural goals, patterns, and detailed strategies for high performance, high availability, scalability, extensibility, security, and agility, supplemented with illustrative examples and practical insights.

Qunar Tech Salon

Mar 31, 2016

Technical Summary of Large‑Scale Distributed Website Architecture

This article is a technical summary for learning large‑scale distributed website architecture. It gives an overview of a high‑performance, highly‑available, scalable, and extensible distributed site and offers a reference architecture, combining reading notes and personal experience.

Outline

Characteristics of large websites

Architectural goals for large websites

Architecture patterns for large websites

High‑performance architecture

High‑availability architecture

Scalable architecture

Extensible architecture

Security architecture

Agile architecture

Examples of large‑scale architectures

1. Characteristics of Large Websites

Massive user base, geographically distributed

High traffic and concurrency

Huge data volume, requiring high availability

Harsh security environment, prone to network attacks

Rich functionality, rapid changes, frequent releases

Growth from small to large, incremental development

User‑centric design

Free services with paid experiences

2. Architectural Goals

High performance: fast user experience

High availability: continuous service access

Scalability: ability to add or remove hardware to adjust capacity

Security: secure access, data encryption, safe storage

Extensibility: easy addition/removal of features or modules

Agility: rapid response to changing business needs

3. Architecture Patterns

Layered: application, service, data, management, analytics layers

Segmentation: divide by business/module/function (e.g., homepage, user center)

Distributed deployment across multiple physical machines

Cluster: multiple instances of a component behind load balancers

Cache: place data close to the application or user to speed access

Asynchronous: decouple request and response using notification or polling

Redundancy: replicate components to improve availability, security, performance

Security: solutions for known issues and mechanisms for unknown threats

Automation: use tools to replace manual repetitive tasks

Agility: embrace requirement changes and respond quickly

4. High‑Performance Architecture

Focus on user‑centric fast page access, aiming for short response time, high concurrency, high throughput, and stable performance. It can be divided into front‑end optimization, application‑layer optimization, code‑level optimization, and storage‑layer optimization.

Front‑end optimization includes reducing HTTP requests, leveraging browser cache, enabling compression, proper placement of CSS/JS, asynchronous JS, and minimizing cookie transmission, as well as using CDN acceleration and reverse proxies.

Application‑layer optimization involves caching, asynchronous processing, and clustering.

Code optimization covers proper architecture, multithreading, resource reuse (object pools, thread pools), good data structures, JVM tuning, singleton patterns, and caches.

Storage optimization includes caching, SSDs, fiber transmission, read/write tuning, disk redundancy, distributed storage (HDFS), NoSQL, etc.

5. High‑Availability Architecture

Large sites must remain accessible at all times. Due to complexity, distribution, cheap servers, open‑source databases, and OS diversity, achieving high availability is challenging; failures are inevitable.

Improving availability starts at the architectural planning stage, often expressed as “nines” (e.g., 99.99% uptime allows about 53 minutes of downtime per year). Strategies differ by layer, typically using redundancy and failover.

Application layer: design stateless services and use load balancing (with session synchronization if needed).

Service layer: load balancing, hierarchical management, fast failure (timeouts), asynchronous calls, service degradation, idempotent design.

Data layer: redundant backups (cold, hot, warm), failover, and CAP theorem considerations (consistency, availability, partition tolerance).

6. Scalable Architecture

Scalability means adjusting processing capacity by adding or removing hardware without redesigning the architecture.

Application layer: vertical or horizontal partitioning, load balancing via DNS, HTTP reverse proxy, IP, or link‑layer methods.

Service layer: similar to application layer.

Data layer: sharding, partitioned tables, NoSQL, common algorithms such as hash and consistent hash.

7. Extensible Architecture

Allows easy addition or removal of functional modules, providing good code‑level extensibility.

Techniques include modularization, componentization (high cohesion, low coupling), stable interfaces, design patterns, message queues for decoupling, and service‑oriented distributed modules.

8. Security Architecture

Provide effective solutions for known issues and mechanisms to detect and defend against unknown threats. Security measures span infrastructure, application, and data confidentiality, including password policies, regular scans, firewalls, DDoS protection, intrusion detection, and network segmentation.

Application security: prevent XSS, injection, CSRF, information leakage, insecure file uploads, path traversal, and use Web Application Firewalls.

Data confidentiality: secure storage, encrypted backups, access controls, and secure transmission using algorithms such as MD5, SHA, DES, 3DES, RC, and RSA.

9. Agility

Architecture and operations must adapt to change, offering high scalability and extensibility to meet rapid business growth and traffic spikes, supported by agile management and development practices.

10. Example of a Large‑Scale Architecture

The example adopts a seven‑layer logical architecture: client layer, front‑end optimization layer, application layer, service layer, data storage layer, big‑data storage layer, and big‑data processing layer.

Client layer supports PC browsers and mobile apps, with mobile apps accessing via IP and reverse proxy.

Front‑end layer uses DNS load balancing, CDN acceleration, and reverse proxy services.

Application layer consists of clustered web applications, vertically split by business (e.g., product, user center).

Service layer provides common services such as user, order, and payment services.

Data layer includes relational DB clusters (with read/write separation), NoSQL clusters, distributed file system clusters, and distributed caches.

Big‑data storage layer collects logs and structured/unstructured data from application and service layers.

Big‑data processing layer performs offline analysis via MapReduce and real‑time analysis via Storm, storing results in relational databases for downstream use.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance architecture Scalability high availability

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.