Common Service Fault Tolerance Patterns
The article explains how Meituan‑Dianping applies classic fault‑tolerance patterns—timeout and retry, rate limiting/load shedding, circuit breaker, bulkhead isolation, and fallback—to design for failure, prevent cascading service outages, and enhance system stability and high‑availability in a service‑oriented architecture.
Background: As Meituan-Dianping's service framework matures, service-oriented architecture is the trend. Increasing service dependencies cause cascading failures and other incidents.
Design principle: "Design for Failure". Aim to prevent a single dependent service failure from severely affecting user experience and enable automatic recovery.
Classic fault‑tolerance patterns covered:
Timeout and Retry – set connection and RPC timeouts; combine with limited retry attempts (usually 1‑2) to avoid resource exhaustion.
public class RetryCommand<T> {
private int maxRetries = 2;
private long retryInterval = 5;
private Map<String, Object> params;
public T command(Map<String, Object> params){ /* remote call with timeout */ }
private T retry() throws RuntimeException {
int retryCounter = 0;
while (retryCounter < maxRetries) {
try { return command(params); }
catch (Exception e) { retryCounter++; if (retryCounter >= maxRetries) break; }
}
throw new RuntimeException("Command failed on all of " + maxRetries + " retries");
}
}Rate Limiting / Load Shedding – control concurrency (e.g., using Java Semaphore) or request rate (token‑bucket algorithm, Guava RateLimiter).
public class SemaphoreTest {
private static final int THREAD_COUNT = 30;
private static ExecutorService threadPool = Executors.newFixedThreadPool(THREAD_COUNT);
private static Semaphore s = new Semaphore(10);
public static void main(String[] args) {
for (int i = 0; i < THREAD_COUNT; i++) {
threadPool.execute(() -> {
try {
s.acquire();
System.out.println("save data");
s.release();
} catch (InterruptedException e) { e.printStackTrace(); }
});
}
threadPool.shutdown();
}
} final RateLimiter rateLimiter = RateLimiter.create(2.0);
void submitTasks(List tasks, Executor executor) {
for (Runnable task : tasks) {
rateLimiter.acquire();
executor.execute(task);
}
}Circuit Breaker – prevents repeated failing calls; three states (Closed, Open, Half‑Open). Often implemented with Netflix Hystrix.
public interface HystrixCircuitBreaker {
boolean allowRequest();
boolean isOpen();
void markSuccess();
}Bulkhead Isolation – isolate resources per dependent service (thread pools, semaphores) so that slowdown in one service does not affect others.
Fallback – provide alternative responses when timeout, retry, circuit‑breaker or rate‑limit triggers. Strategies include custom handling, fail‑silent, and fail‑fast.
Application example: the article shows how the four patterns can be combined in a Hystrix command flow, illustrating the decision points for timeout, retry, bulkhead, circuit‑breaker, and fallback.
Conclusion: These fault‑tolerance patterns are widely used in Meituan‑Dianping to improve system stability and resilience. Understanding and applying them helps engineers build high‑availability services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
