Operations 7 min read

How to Build Highly Available Systems: 8 Essential Strategies

This article outlines eight practical high‑availability techniques—multiple replicas, isolation, rate limiting, circuit breaking, degradation, gray releases with rollback, comprehensive monitoring, and proactive log alerting—to help engineers design systems that are both efficient and reliable under heavy load.

Java High-Performance Architecture

Jul 2, 2019

How to Build Highly Available Systems: 8 Essential Strategies

1. Multiple Replicas

Avoid single points of failure by not putting all eggs in one basket. Typically gateways, application servers, cache servers, databases, etc., are deployed with multiple replicas. Stateless services are easy to replicate; stateful services require data synchronization, e.g., using publish‑subscribe, Redis Cluster, MySQL master‑slave replication. Data sync introduces consistency‑availability trade‑offs; asynchronous replication may lose recent writes if master fails.

2. Isolation

Isolation separates system resources so failures are contained. Forms include data isolation (physically separate core and non‑core data), machine isolation (dedicated machines for VIP callers), thread‑pool isolation (separate thread pools per service), and semaphore isolation (limit concurrent requests with a semaphore, queue excess, trigger fallback).

3. Rate Limiting

Rate limiting protects services by capping concurrent requests or request rates. Technical limits use connection pools, thread pools, Nginx limit_conn, Guava RateLimiter, Nginx limit_req. Business limits control high‑traffic events such as flash sales, allowing only a subset of users to proceed.

4. Circuit Breaker

Like a fuse, a circuit breaker stops calling a failing service based on error rate or response time, and retries after a cooldown. If the service remains unhealthy, the breaker stays open.

Rate limiting protects the server itself; circuit breaking protects the client.

5. Degradation

During heavy load, non‑core functions (e.g., recommendation engine) can be disabled to preserve core business such as checkout.

6. Gray Release & Rollback

Gradually roll out new features to a small user segment, monitor, then expand. For system refactoring, run old and new versions in parallel, shifting traffic gradually. If serious issues appear, rollback either the whole system or specific features via feature toggles.

7. Monitoring System

Observe system health through resource monitoring (CPU, memory, disk, network), system monitoring (URL failures, API latency, JVM GC), and business monitoring (e.g., order payment success rate) to detect anomalies.

8. Log Alerting

Logs help locate problems and can trigger proactive alerts. Write explicit logs for anticipated errors (using assertions) and monitor them to generate alerts before issues spread.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring High Availability System Design gray-release rate limiting circuit breaker degradation

Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.