Operations 14 min read

Common Service Fault Tolerance Patterns

The article explains how Meituan‑Dianping applies classic fault‑tolerance patterns—timeout and retry, rate limiting/load shedding, circuit breaker, bulkhead isolation, and fallback—to design for failure, prevent cascading service outages, and enhance system stability and high‑availability in a service‑oriented architecture.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Common Service Fault Tolerance Patterns

Background: As Meituan-Dianping's service framework matures, service-oriented architecture is the trend. Increasing service dependencies cause cascading failures and other incidents.

Design principle: "Design for Failure". Aim to prevent a single dependent service failure from severely affecting user experience and enable automatic recovery.

Classic fault‑tolerance patterns covered:

Timeout and Retry – set connection and RPC timeouts; combine with limited retry attempts (usually 1‑2) to avoid resource exhaustion.

public class RetryCommand<T> {
    private int maxRetries = 2;
    private long retryInterval = 5;
    private Map<String, Object> params;
    public T command(Map<String, Object> params){ /* remote call with timeout */ }
    private T retry() throws RuntimeException {
        int retryCounter = 0;
        while (retryCounter < maxRetries) {
            try { return command(params); }
            catch (Exception e) { retryCounter++; if (retryCounter >= maxRetries) break; }
        }
        throw new RuntimeException("Command failed on all of " + maxRetries + " retries");
    }
}

Rate Limiting / Load Shedding – control concurrency (e.g., using Java Semaphore) or request rate (token‑bucket algorithm, Guava RateLimiter).

public class SemaphoreTest {
    private static final int THREAD_COUNT = 30;
    private static ExecutorService threadPool = Executors.newFixedThreadPool(THREAD_COUNT);
    private static Semaphore s = new Semaphore(10);
    public static void main(String[] args) {
        for (int i = 0; i < THREAD_COUNT; i++) {
            threadPool.execute(() -> {
                try {
                    s.acquire();
                    System.out.println("save data");
                    s.release();
                } catch (InterruptedException e) { e.printStackTrace(); }
            });
        }
        threadPool.shutdown();
    }
}
final RateLimiter rateLimiter = RateLimiter.create(2.0);
void submitTasks(List tasks, Executor executor) {
    for (Runnable task : tasks) {
        rateLimiter.acquire();
        executor.execute(task);
    }
}

Circuit Breaker – prevents repeated failing calls; three states (Closed, Open, Half‑Open). Often implemented with Netflix Hystrix.

public interface HystrixCircuitBreaker {
    boolean allowRequest();
    boolean isOpen();
    void markSuccess();
}

Bulkhead Isolation – isolate resources per dependent service (thread pools, semaphores) so that slowdown in one service does not affect others.

Fallback – provide alternative responses when timeout, retry, circuit‑breaker or rate‑limit triggers. Strategies include custom handling, fail‑silent, and fail‑fast.

Application example: the article shows how the four patterns can be combined in a Hystrix command flow, illustrating the decision points for timeout, retry, bulkhead, circuit‑breaker, and fallback.

Conclusion: These fault‑tolerance patterns are widely used in Meituan‑Dianping to improve system stability and resilience. Understanding and applying them helps engineers build high‑availability services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed Systemsfault toleranceRetryrate limitingcircuit breakerFallbackbulkhead
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.