How to Build Scalable Internet Architecture for Massive Traffic Spikes

This article outlines a layered architecture—including load balancing, CDN, caching, micro‑services, read/write separation and database clustering—to handle the massive data volume and high concurrency of modern e‑commerce platforms during peak events like Double‑11 and 618.

dbaplus Community
dbaplus Community
dbaplus Community
How to Build Scalable Internet Architecture for Massive Traffic Spikes

Internet Layer

Load Balancing : Distribute inbound traffic among multiple server instances to avoid a single point of overload. Typical implementations include:

Reverse‑proxy based load balancers such as nginx (configure upstream blocks and proxy_pass).

Cloud provider load‑balancing services (e.g., Alibaba SLB, AWS ELB) that provide health‑checking, session‑persistence, and SSL termination.

Content Delivery Network (CDN) : Deploy edge nodes that cache static assets (HTML, CSS, JS, images, video) close to end users. The CDN selects the optimal edge based on latency, load, and network conditions, reducing backbone traffic and improving response time. Commercial CDN products (Akamai, Verizon EdgeCast, ChinaCache) or cloud‑native CDN services can be used.

Internet architecture layer diagram
Internet architecture layer diagram

Web Server Layer

Session → Cookie / Shared Cache : In distributed deployments, storing session state in server memory leads to synchronization problems. Two common solutions are:

Encode necessary session data into signed, encrypted client‑side cookies, eliminating server‑side state.

Persist session data in a shared in‑memory store (e.g., Redis or Memcached). Some caches support persistence to disk for durability.

Static Page Generation : Convert frequently accessed dynamic pages into static HTML files. The static files are served directly by the web server, dramatically reducing database load and improving SEO. Typical workflow:

# Example using a build script
python generate_static.py --source templates/ --output static/

Caching (browser, CDN, in‑memory):

Browser cache: set Cache‑Control and ETag headers for static resources.

CDN cache: configure edge TTLs to keep copies of assets close to users.

Application‑level cache: deploy a Redis instance and cache expensive query results or rendered fragments.

Gzip Compression : Enable HTTP compression on the web server (e.g., gzip on; in Nginx) so that HTML, CSS, JavaScript, and JSON payloads are sent in a compressed form, typically reducing size by 60‑80%.

Asset Bundling (One File) : Merge multiple small files (icons, CSS, JavaScript) into a single file to reduce HTTP request count. Tools such as webpack, gulp, or grunt can automate this process.

Server Cluster : Deploy a group of identical web‑server nodes that share a load balancer. Clustering improves throughput and provides failover; nodes communicate via a private LAN and can be managed with orchestration tools (e.g., Kubernetes, Docker Swarm).

Application / Business Server Layer

Distributed / Micro‑services Architecture : Decompose a monolithic application into independent services, each responsible for a bounded context. Benefits include independent scaling, fault isolation, and easier continuous deployment. Container runtimes (Docker) and orchestration platforms (Kubernetes) simplify packaging, networking, and lifecycle management.

In‑Memory Caching : Use Memcached or Redis to cache hot data (e.g., product catalogs, user profiles) and intermediate computation results. Example pattern:

# Pseudocode
value = cache.get(key)
if not value:
    value = db.query(...)
    cache.set(key, value, ttl=300)

Sync‑to‑Async via Message Queues : Replace blocking RPC calls with asynchronous processing. Producers place tasks on a queue; consumers process them and optionally push results back via callbacks or notification topics. Popular MQs include:

Apache Kafka – high‑throughput, partitioned log.

RabbitMQ – flexible routing with exchanges.

IBM WebSphere MQ – enterprise‑grade reliability.

Typical usage:

# Producer (Python)
import pika
channel.basic_publish(exchange='', routing_key='order', body=order_json)

# Consumer
for method, properties, body in channel.consume('order'):
    process_order(json.loads(body))
    channel.basic_ack(method.delivery_tag)

Data Access, File Access, Internal Network Layer

Read/Write Separation : Direct write (INSERT/UPDATE/DELETE) operations to a primary (master) database, while routing SELECT queries to one or more replicas (slaves). This reduces read latency and distributes load. Implementation often relies on a proxy layer (e.g., MySQL‑Proxy, ProxySQL) that rewrites queries based on type.

Database Cluster : Combine multiple DB servers into a single logical instance that provides transparent failover and load balancing. Clustering solutions (MySQL Group Replication, Galera Cluster, Oracle RAC) maintain data consistency across nodes and present a single endpoint to applications.

Distributed Storage (DAS/NAS/SAN) : Use network‑attached storage for scalable object storage. Cloud object services such as Alibaba OSS, Amazon S3, or self‑hosted Ceph provide high durability and can be accessed via HTTP APIs or native SDKs.

Cache at the Data Layer : Leverage CPU L2/L3 caches, OS page cache, and application‑level caches to accelerate data retrieval. For large analytical workloads, pre‑aggregate results into materialized views or cache tables.

NoSQL / Key‑Value Stores : Adopt non‑relational databases when schema flexibility, horizontal scalability, or low‑latency access is required. Options include:

Document stores (MongoDB, Couchbase)

Columnar stores (Cassandra, HBase)

Key‑Value stores (Redis, DynamoDB)

Graph databases (Neo4j, JanusGraph)

Partitioning / Sharding : Split large tables into smaller, more manageable pieces.

Range partitioning : Divide rows by a date or numeric range.

Hash partitioning : Distribute rows based on a hash of a key column.

Horizontal sharding : Place each partition on a separate physical node.

Vertical sharding : Separate columns into different tables or databases.

Example: a table with 1 000 000 rows can be partitioned into 10 shards of 100 000 rows each; queries that include the shard key can be routed directly to the relevant shard, reducing I/O.

Border Gateway Protocol (BGP) : Use BGP to announce multiple ISP routes for an IP prefix, enabling multi‑line connectivity and automatic failover. Enterprises obtain an Autonomous System Number (ASN) and configure BGP sessions with upstream providers so that traffic is dynamically routed over the lowest‑latency path.

Conclusion

For high‑volume, high‑concurrency internet platforms, the most effective architectural levers are:

Caching at every layer (browser, CDN, application, database) to eliminate redundant work.

Asynchronous processing via message queues to decouple request handling from long‑running tasks.

Data partitioning (read/write separation, sharding, clustering) to scale storage and query throughput.

When designing a system, identify the natural boundaries for splitting workloads, choose appropriate cache granularity, and isolate services to achieve linear scalability under peak traffic conditions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

load balancingCDNhigh concurrency
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.