Industry Insights 14 min read

Scalable Internet Architecture: DNS, Load Balancing, API Gateways & Microservices

This article outlines how modern internet companies design a scalable architecture by integrating DNS resolution, load balancing strategies, persistent connections, API gateways, push notification systems, microservice communication, distributed transactions, and supporting infrastructure services.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Scalable Internet Architecture: DNS, Load Balancing, API Gateways & Microservices

Overall Architecture

Clients (mobile apps, PC browsers, third‑party services) first resolve the service domain. Traditional DNS uses the ISP‑provided LocalDNS, while mobile apps can use HttpDNS to obtain the IP address of the load balancer in real time. The request reaches a unified access layer that keeps long‑lived TCP connections, then forwards traffic to an API gateway. The gateway is the entry point for all microservices and handles protocol conversion, routing, authentication, traffic control and caching. Business servers push real‑time notifications (e.g., instant messaging, alerts) via a dedicated PUSH system. Internal services communicate through a proprietary RPC protocol and may call external third‑party services through a NAT gateway.

Domain Name Resolution

Traditional DNS

DNS is a distributed directory that maps domain names to IP addresses. A client sends a recursive query to its LocalDNS (usually the ISP’s edge DNS). The LocalDNS performs iterative queries to upstream name servers until the authoritative server returns the final IP address.

HttpDNS

HttpDNS sends DNS queries over HTTP(S) to a dedicated DNS service, bypassing the ISP’s LocalDNS. This avoids DNS hijacking and cross‑network access problems, providing more reliable resolution for mobile Internet services.

Load Balancing

To eliminate single‑machine bottlenecks and single points of failure, traffic is distributed across multiple backend servers. The load balancer performs periodic health checks and removes unhealthy nodes from the pool.

Layer 4 vs Layer 7

L4 (Transport‑layer) load balancing forwards packets based only on transport‑layer information such as the TCP SYN packet. The balancer does not terminate the connection; it selects a backend server and rewrites the MAC/IP headers accordingly.

L7 (Application‑layer) load balancing terminates the client connection, parses the HTTP request, and then opens a separate connection to the chosen backend server. This enables richer routing decisions (e.g., URL‑based routing, header inspection).

LVS Forwarding Modes

DR (Direct Routing)

NAT (Network Address Translation)

TUNNEL (IP‑in‑IP tunneling)

FULL NAT (double NAT with SNAT)

Each mode rewrites packet headers differently and imposes specific network topology requirements (e.g., DR requires the scheduler and real servers to share a physical network segment).

Scheduling Algorithms

Round‑Robin – distributes requests sequentially without considering server load.

Weighted Round‑Robin – assigns higher probability to servers with larger weights, useful when backend capacities differ.

Least Connections – sends traffic to the server with the fewest active connections.

Hash – maps a request key (e.g., client IP or URL) to a server using a hash function; consistent hashing minimizes disruption when nodes are added or removed.

API Gateway

The API gateway is a clustered service that acts as the single external entry point. It encapsulates internal microservices and exposes REST/HTTP APIs while providing non‑functional capabilities such as authentication, monitoring, caching, rate limiting and traffic control.

API Management

Supports the full lifecycle of an API: creation, versioning, publishing, rollback and deprecation. Front‑end configuration defines HTTP methods, paths and parameters; back‑end configuration binds the route to a specific microservice name and its parameters.

Asynchronous Processing

Because the gateway mainly handles network I/O, non‑blocking I/O (e.g., Netty + NIO) and event‑driven frameworks (Spring 5 WebFlux) allow a small thread pool to serve massive concurrent connections, reducing context‑switch overhead and increasing throughput.

Chain Processing

The gateway implements a filter chain (responsibility‑pattern). Typical filters include routing, protocol conversion, caching, rate limiting, monitoring and logging. Each request passes through the pre filters, is forwarded to the downstream service, then passes through the post filters before the response is returned to the client.

Rate Limiting

Rate limiting protects the system from overload. Implementations can be cluster‑wide (using a shared store such as Redis) or single‑node (in‑memory). Common algorithms are:

Counter – simple fixed‑window counting.

Leaky Bucket – smooths burst traffic by processing requests at a constant rate.

Token Bucket – allows bursts up to a configurable token capacity and then refills tokens at a steady rate (generally recommended).

Circuit Breaker & Service Degradation

Service Circuit Breaker

When a downstream service becomes unavailable or slow, the circuit breaker opens, causing the upstream service to return an error immediately and free resources. The breaker periodically tests the downstream service; if it recovers, the circuit closes and normal calls resume.

Service Degradation

If overall load exceeds capacity, non‑critical functionality can be degraded or disabled. Degradation can be applied at the API level, feature level, or system level, often by returning cached data or a simplified response.

Business Isolation

To prevent cross‑impact between different business domains, isolation can be achieved by separating thread pools or, preferably for Java, by deploying separate clusters (processes or containers) for each business line.

PUSH Notification System

The push system supports multiple vendor channels (Apple APNs, Huawei, Xiaomi, Firebase Cloud Messaging). Device registration, user binding and message delivery follow these steps:

Device connects and registers its token.

Device binds to a user identifier.

When a business event occurs, the server creates a message and stores it persistently.

The push service attempts delivery via the appropriate vendor channel or a custom TCP channel.

If the device is offline, the message remains in the queue; delivery is retried on the next device connection.

Clients acknowledge receipt; the server updates the message status. Duplicate deliveries are filtered using deduplication logic.

Microservice Ecosystem

Typical microservice deployments place services behind load balancers, expose them through an API gateway, and enable inter‑service RPC calls. This architecture provides horizontal scalability, fault isolation, and centralized management of cross‑cutting concerns such as security, monitoring and traffic control.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendarchitectureMicroservicesload balancingapi-gatewayDNS
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.