Backend Development 12 min read

Understanding Message Queues: Concepts, Benefits, Challenges, and Real-World Practices

This article explains what message queues are, why they are essential for decoupling, asynchronous processing, and traffic shaping, examines common architectural issues, and presents real‑world implementations such as RocketMQ, Kafka, and CMQ in high‑traffic scenarios like Double‑11, TikTok, and WeChat red‑packet payments.

Architect

Dec 13, 2021

Understanding Message Queues: Concepts, Benefits, Challenges, and Real-World Practices

Part1 – What? Why?

1. What is a Message Queue

A message queue is a distributed middleware that stores and forwards messages between services, similar to a Java queue but designed for inter‑service communication, providing FIFO or double‑ended access, blocking, and load‑balancing capabilities.

2. Why Use a Message Queue

Message queues enable system decoupling, asynchronous processing, and peak‑shaving, reducing tight coupling and request‑driven pressure across services.

System Decoupling

By publishing events and allowing interested services to subscribe, services can evolve independently without direct API calls, lowering integration complexity and deployment risk.

Service Asynchrony

Non‑critical operations such as payment notifications can be off‑loaded to a queue, ensuring the core transaction flow remains fast while downstream processes handle the work asynchronously.

Peak‑Shaving (削峰填谷)

During traffic spikes (e.g., flash sales, live‑event red packets), queues buffer excess load and release it at a sustainable rate, protecting downstream systems from overload.

Other Features

Queues also support broadcast, transactional messaging, and eventual consistency patterns.

3. Problems Introduced by Message Queues

Increased Latency

Because messages must travel through the queue before consumption, there is an inherent delay that can affect time‑sensitive business logic.

Architectural Complexity

Introducing a queue adds a new component that must be highly available and performant, raising challenges such as high‑availability deployment, retry mechanisms, broker synchronization, idempotent processing, and consumer error handling.

Part2 – How?

4. RocketMQ’s Zero‑Failure Double‑11 Experience

During the 2020 Double‑11 peak (58.3 W transactions/sec), RocketMQ employed pull‑based consumption with load‑balancing; however, client hangs could cause backlog. The newer POP consumption model eliminates rebalance delays by having clients request messages directly from brokers, ensuring continuous consumption even if some clients hang.

5. KAFKA Smooth Scaling at Kuaishou

To achieve seamless scaling, Kuaishou synchronizes data from partitions being moved to new partitions while consumers continue reading from the original ones, then switches routing once synchronization catches up, minimizing disruption.

6. Kafka Cache Pollution Mitigation at Kuaishou & Meituan

Meituan’s Real‑Time/Delay Consumption Isolation

Real‑time data is cached on SSD while delayed data stays on HDD, preventing page‑cache contention; reads for delayed data never pollute the SSD cache.

Kuaishou’s Flush‑Queue Design

Producers write messages to a flush queue before they are persisted to block cache; asynchronous threads flush to disk, and consumers read from block cache without causing cache miss writes, avoiding pollution.

Summary of Cache‑Pollution Solutions

Separating workloads by speed or source and applying “divide‑and‑conquer” strategies effectively prevents cache interference.

7. CMQ in WeChat Red‑Packet Payment Scenario

CMQ buffers failed accounting requests, allowing the high‑availability message service to guarantee eventual consistency without immediate rollback, simplifying the payment flow under heavy load.

Part3 – Conclusion

The article demonstrates how message queues, through architectural optimizations and practical case studies from Alibaba, Kuaishou, Meituan, and WeChat, enable high‑concurrency systems to achieve decoupling, asynchronous processing, peak‑shaving, and reliable data consistency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Message Queue RocketMQ

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.