Scalable Internet Architecture: DNS, Load Balancing, API Gateways & Microservices
This article outlines how modern internet companies design a scalable architecture by integrating DNS resolution, load balancing strategies, persistent connections, API gateways, push notification systems, microservice communication, distributed transactions, and supporting infrastructure services.
Overall Architecture
Clients (mobile apps, PC browsers, third‑party services) first resolve the service domain. Traditional DNS uses the ISP‑provided LocalDNS, while mobile apps can use HttpDNS to obtain the IP address of the load balancer in real time. The request reaches a unified access layer that keeps long‑lived TCP connections, then forwards traffic to an API gateway. The gateway is the entry point for all microservices and handles protocol conversion, routing, authentication, traffic control and caching. Business servers push real‑time notifications (e.g., instant messaging, alerts) via a dedicated PUSH system. Internal services communicate through a proprietary RPC protocol and may call external third‑party services through a NAT gateway.
Domain Name Resolution
Traditional DNS
DNS is a distributed directory that maps domain names to IP addresses. A client sends a recursive query to its LocalDNS (usually the ISP’s edge DNS). The LocalDNS performs iterative queries to upstream name servers until the authoritative server returns the final IP address.
HttpDNS
HttpDNS sends DNS queries over HTTP(S) to a dedicated DNS service, bypassing the ISP’s LocalDNS. This avoids DNS hijacking and cross‑network access problems, providing more reliable resolution for mobile Internet services.
Load Balancing
To eliminate single‑machine bottlenecks and single points of failure, traffic is distributed across multiple backend servers. The load balancer performs periodic health checks and removes unhealthy nodes from the pool.
Layer 4 vs Layer 7
L4 (Transport‑layer) load balancing forwards packets based only on transport‑layer information such as the TCP SYN packet. The balancer does not terminate the connection; it selects a backend server and rewrites the MAC/IP headers accordingly.
L7 (Application‑layer) load balancing terminates the client connection, parses the HTTP request, and then opens a separate connection to the chosen backend server. This enables richer routing decisions (e.g., URL‑based routing, header inspection).
LVS Forwarding Modes
DR (Direct Routing)
NAT (Network Address Translation)
TUNNEL (IP‑in‑IP tunneling)
FULL NAT (double NAT with SNAT)
Each mode rewrites packet headers differently and imposes specific network topology requirements (e.g., DR requires the scheduler and real servers to share a physical network segment).
Scheduling Algorithms
Round‑Robin – distributes requests sequentially without considering server load.
Weighted Round‑Robin – assigns higher probability to servers with larger weights, useful when backend capacities differ.
Least Connections – sends traffic to the server with the fewest active connections.
Hash – maps a request key (e.g., client IP or URL) to a server using a hash function; consistent hashing minimizes disruption when nodes are added or removed.
API Gateway
The API gateway is a clustered service that acts as the single external entry point. It encapsulates internal microservices and exposes REST/HTTP APIs while providing non‑functional capabilities such as authentication, monitoring, caching, rate limiting and traffic control.
API Management
Supports the full lifecycle of an API: creation, versioning, publishing, rollback and deprecation. Front‑end configuration defines HTTP methods, paths and parameters; back‑end configuration binds the route to a specific microservice name and its parameters.
Asynchronous Processing
Because the gateway mainly handles network I/O, non‑blocking I/O (e.g., Netty + NIO) and event‑driven frameworks (Spring 5 WebFlux) allow a small thread pool to serve massive concurrent connections, reducing context‑switch overhead and increasing throughput.
Chain Processing
The gateway implements a filter chain (responsibility‑pattern). Typical filters include routing, protocol conversion, caching, rate limiting, monitoring and logging. Each request passes through the pre filters, is forwarded to the downstream service, then passes through the post filters before the response is returned to the client.
Rate Limiting
Rate limiting protects the system from overload. Implementations can be cluster‑wide (using a shared store such as Redis) or single‑node (in‑memory). Common algorithms are:
Counter – simple fixed‑window counting.
Leaky Bucket – smooths burst traffic by processing requests at a constant rate.
Token Bucket – allows bursts up to a configurable token capacity and then refills tokens at a steady rate (generally recommended).
Circuit Breaker & Service Degradation
Service Circuit Breaker
When a downstream service becomes unavailable or slow, the circuit breaker opens, causing the upstream service to return an error immediately and free resources. The breaker periodically tests the downstream service; if it recovers, the circuit closes and normal calls resume.
Service Degradation
If overall load exceeds capacity, non‑critical functionality can be degraded or disabled. Degradation can be applied at the API level, feature level, or system level, often by returning cached data or a simplified response.
Business Isolation
To prevent cross‑impact between different business domains, isolation can be achieved by separating thread pools or, preferably for Java, by deploying separate clusters (processes or containers) for each business line.
PUSH Notification System
The push system supports multiple vendor channels (Apple APNs, Huawei, Xiaomi, Firebase Cloud Messaging). Device registration, user binding and message delivery follow these steps:
Device connects and registers its token.
Device binds to a user identifier.
When a business event occurs, the server creates a message and stores it persistently.
The push service attempts delivery via the appropriate vendor channel or a custom TCP channel.
If the device is offline, the message remains in the queue; delivery is retried on the next device connection.
Clients acknowledge receipt; the server updates the message status. Duplicate deliveries are filtered using deduplication logic.
Microservice Ecosystem
Typical microservice deployments place services behind load balancers, expose them through an API gateway, and enable inter‑service RPC calls. This architecture provides horizontal scalability, fault isolation, and centralized management of cross‑cutting concerns such as security, monitoring and traffic control.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
