Backend Development 16 min read

Design and Implementation of a High‑Performance Message Notification System

This article presents a comprehensive design of a high‑performance, fault‑tolerant message notification system, covering service partitioning, system architecture, idempotent processing, dynamic error detection, thread‑pool management, retry mechanisms, and stability measures such as traffic‑spike handling, resource isolation, third‑party protection, monitoring, and active‑active deployment.

Zhuanzhuan Tech

Nov 20, 2024

Design and Implementation of a High‑Performance Message Notification System

1. Service Partitioning

The system is divided into several layers: a configuration layer for managing sending configurations, an interface layer exposing RPC and MQ interfaces, a core service layer handling first‑time and retry sending, routing, and execution wrappers, a common component layer, and a storage layer consisting of a cache layer and a persistence layer (Redis, MySQL, Elasticsearch).

2. System Design

2.1 First‑time Message Sending

Requests are processed via RPC or MQ; RPC avoids message loss while MQ provides asynchronous decoupling and traffic shaping.

2.1.1 Idempotency Handling

Idempotency is achieved by using Redis keys to quickly detect duplicate messages within a short time window, avoiding costly database lookups.

private boolean isDuplicate(MessageDto messageDto) {
    String redisKey = getRedisKey(messageDto);
    boolean isDuplicate = false;
    try {
        if (!RedisUtils.setNx(redisKey, messageDto, 30*60L)) {
            isDuplicate = true;
        }
        if (isDuplicate) {
            MessageDto oldDTO = RedisUtils.getObject(redisKey);
            if (Objects.equals(messageDto, oldDTO)) {
                log.info("消息重复了");
            } else {
                isDuplicate = false;
            }
        }
    } catch (Exception e) {
        isDuplicate = false;
    }
    return isDuplicate;
}

2.1.2 Dynamic Problem‑Service Detector

Using Sentinel, the system monitors request counts and failure counts within a configurable time window; when thresholds are exceeded, the service is marked as problematic and routed to an exception executor. Automatic recovery is performed after a silent period followed by a half‑circuit‑breaker phase.

public void loadExecuteHandlerRules(Long durationInSec, Long requestCount, Long failCount) {
    List<ParamFlowRule> rules = new ArrayList<>();
    rules.add(ofParamFlowRule(REQUEST_RESOURCE, requestCount, durationInSec));
    rules.add(ofParamFlowRule(FAIl_RESOURCE, failCount, durationInSec));
    ParamFlowRuleManager.loadRules(rules);
}

public ParamFlowRule ofParamFlowRule(String resource, Long failCount, Long durationInSec) {
    ParamFlowRule rule = new ParamFlowRule();
    rule.setResource(FAIl_RESOURCE);
    rule.setGrade(RuleConstant.FLOW_GRADE_QPS);
    rule.setCount(failCount);
    rule.setDurationInSec(durationInSec);
    rule.setParamIdx(0);
    return rule;
}

public static boolean judge(String key, boolean reqSuc) {
    return isBlock(REQUEST_RESOURCE, reqSuc, key) && isBlock(FAIl_RESOURCE, reqSuc, key);
}

public Boolean isBlock(String resource, boolean reqSuc, String key) {
    boolean block = false;
    Entry failEntry = null;
    try {
        failEntry = entry(resource, EntryType.IN, reqSuc ? 0 : 1, key);
    } catch (BlockException e) {
        block = true;
    } finally {
        if (failEntry != null) {
            failEntry.exit();
        }
    }
    return block;
}

2.1.3 Sentinel Sliding‑Window Implementation (Ring Buffer)

The sliding window divides the time window into multiple slots; each slot is identified by a windowsId and has a start time. Requests are counted in the appropriate slot based on the current timestamp.

2.1.4 Dynamic Thread‑Pool Adjustment

Two separate thread pools are used for normal and abnormal services. The pool size can be adjusted at runtime via configuration (Apollo/Nacos). A custom busy‑check prevents task submission when the queue exceeds a configurable threshold, falling back to MQ persistence.

ThreadPoolExecutor pool = new ThreadPoolExecutor(poolSize, poolSize, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>());

Notifier notifier = getNotifier();
if (!notifier.isBusy()) {
    notifier.execute(msgContent);
}

public boolean isBusy() {
    return notifyPool.getQueue().size() >= config.getMaxHandlerSize() * 2;
}

2.2 Retry Message Sending

Failed or throttled messages are retried using a distributed scheduled task framework with shard‑broadcast execution. Locks ensure that the same task is not processed concurrently on multiple nodes.

public void init() {
    ScheduledExecutorService scheduledService = new ScheduledThreadPoolExecutor(taskRepository.size());
    for (Map.Entry<String, TaskHandler> entry : taskRepository.entrySet()) {
        final String taskName = entry.getKey();
        final TaskHandler handler = entry.getValue();
        scheduledService.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                try {
                    if (handler.isBusy()) {
                        return;
                    }
                    handleTask(taskName, handler);
                } catch (Throwable e) {
                    logger.error(taskName + " task handler fail!", e);
                }
            }
        }, 30, 5, TimeUnit.SECONDS);
    }
}

public void handTask(String taskName, TaskHandler handler) {
    Lock lock = LockFactory.getLock(taskName);
    List<ScheduleTaskEntity> taskList = null;
    try {
        if (lock.tryLock()) {
            taskList = getTaskList(taskName, handler);
        }
    } finally {
        lock.unlock();
    }
    if (taskList == null) return;
    handler.handleData(taskList);
}

2.2.1 ES and MySQL Data Synchronization

Message send records are stored in MySQL and indexed in Elasticsearch. Synchronization uses the update_time field as a version to ensure consistency; ES indices are rolled monthly with hot/cold tagging for lifecycle management.

3. Stability Guarantees

3.1 Traffic Spikes

Two‑level degradation is applied: gradual increase triggers thread‑pool busy detection, then MQ buffering and delayed consumption; sudden spikes are handled directly by Sentinel, routing to MQ for delayed processing after resources free up.

3.2 Problem‑Service Resource Isolation

Separate thread pools for problematic and normal services prevent long‑running error‑service calls from starving normal traffic.

3.3 Third‑Party Service Protection

Rate‑limiting and circuit‑breaker mechanisms shield downstream third‑party services from overload and protect the system from cascading failures.

3.4 Middleware Fault Tolerance

Design accounts for temporary middleware outages (e.g., MQ restart) by ensuring graceful degradation and fallback strategies.

3.5 Monitoring System

A comprehensive monitoring suite tracks request counts, failure rates, and resource usage to enable early detection and rapid response.

3.6 Active‑Active Deployment and Elastic Scaling

Services are deployed across multiple data centers with automatic scaling based on load metrics, ensuring high availability and cost‑effective resource utilization.

4. Conclusion

Effective message notification systems require careful consideration of service architecture, functional modules, and stability mechanisms; there is no one‑size‑fits‑all solution, and designs must be tailored to specific business scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Java backend-architecture fault tolerance high performance Message Notification

Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.