Operations 9 min read

Ensuring High Availability in Internet Services: Stateless Design, Service Discovery, Idempotency, Rate Limiting, and Microservices

The article discusses how to achieve high availability for large‑scale internet services by adopting stateless architecture, service discovery and registration, heartbeat monitoring, idempotent design, retry mechanisms, rate limiting, caching, and micro‑service decomposition to handle machine failures, network glitches, and high concurrency.

Architecture Digest

Oct 22, 2017

Ensuring High Availability in Internet Services: Stateless Design, Service Discovery, Idempotency, Rate Limiting, and Microservices

In the era of massive internet traffic and high concurrency, services must remain highly available; otherwise user experience suffers, company reputation is damaged, and financial losses can be severe, as illustrated by the 2015 Ctrip outage costing over $1 million per hour.

High availability is complex, and the author shares personal insights on achieving it.

How to Make a System Highly Available?

We cannot keep servers or services from failing; instead we must design so that failures do not affect overall service availability.

When many machines and services exist, the failure of a subset can be tolerated. The key is to avoid storing state on individual machines, making services stateless so any instance can replace another.

If services are already stateless, how can the system detect a failed instance and redirect traffic? The answer is service discovery and registration.

Beyond machine or service failures, temporary network partitions must also be handled. Heartbeat checks between services detect unreachable nodes, and service registries help reroute traffic.

When a request is sent to a service that crashes after processing but before responding, idempotent design ensures that retrying the operation does not cause duplicate effects (e.g., double payment).

With high concurrency, how can the system increase capacity?

Adding more machines and services is a straightforward way to scale, but load‑balancing strategies, black‑/white‑lists, and rate‑limiting are needed to distribute traffic effectively.

Simply adding machines is not always sufficient; bottlenecks such as blocking calls or service timeouts must be addressed. Services should enforce timeouts and be designed to be idempotent.

Synchronous versus asynchronous designs affect concurrency. Asynchronous processing, often implemented with message queues, can improve CPU utilization and overall throughput, though it introduces challenges like guaranteeing true asynchrony and avoiding message loss, especially under large‑scale data loads.

When traffic exceeds a service’s capacity, how should it be handled?

The solution is rate limiting and service degradation: either reject excess requests or temporarily shut down non‑essential services, with the former being the preferred approach.

Cache is another essential tool; because internet workloads are read‑heavy, caching dramatically reduces load on backend systems.

Further decomposition of services can improve scalability.

Micro‑services enable vertical and horizontal splitting of business logic, allowing independent scaling and easier fault isolation.

After applying these techniques, services can survive machine failures, service crashes, and network issues while handling higher concurrency, but new challenges arise, such as maintaining eventual consistency across distributed transactions, monitoring call chains, setting up alerts, aggregating distributed logs, and using advanced diagnostics like jstack and Btrace.

Source: http://www.cnblogs.com/lirenzuo/p/7637984.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Microservices Scalability high availability service discovery caching Idempotency rate limiting

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.