How We Migrated Our Self‑Built Message Queue to Tencent Cloud CKafka
This article details why the e‑commerce platform built its own Corgi message queue, the operational and cost drawbacks that prompted a move to Tencent Cloud CKafka, and the three‑phase migration strategy—including dual‑write, cut‑read, and cut‑write—while preserving message safety and low latency.
Background
When a company needs a message‑queue middleware, the typical choices are cloud‑provider services or self‑hosted open‑source solutions such as Kafka or RabbitMQ. In 2015, Mushroom Street (蘑菇街) chose to build its own queue, Corgi, because of data‑security concerns and the need for many topics and partitions, which made public cloud latency unacceptable.
Why Migrate
By 2020, cloud‑based queues became mature and cost‑effective. Maintaining Corgi incurred two major disadvantages: operational pain points (manual master‑slave failover and expansion) and high server costs due to a three‑node master‑slave architecture and high CPU requirements.
Migration Requirements
The migration had to guarantee no message loss (Corgi’s AtLeastOnce semantics), preserve partial ordering for ordered consumers, and avoid any service downtime.
Migration Plan
The process was divided into three stages: dual‑write, cut‑read, and cut‑write.
Dual‑Write Phase
Two approaches were considered: modifying the producer to write both Corgi and Kafka, or using a MirrorMaker to sync messages. The team selected MirrorMaker, which consumes from Corgi and produces to Kafka while ensuring atomic commits, preserving order and handling failures gracefully.
Cut‑Read Phase
Consumers were split into non‑ordered and ordered groups. Non‑ordered consumers required no special handling—business logic ensured idempotency. Ordered consumers used a pre‑created early offset in Kafka and a switch in the NewClient to pause consumption until all instances were upgraded, then resumed from the prepared offset to maintain order.
Cut‑Write Phase
After all consumers migrated and MirrorMaker was the sole reader of Corgi, producers were switched. The NewClient’s partition selector mimicked MirrorMaker’s logic, ensuring that messages from the same Corgi partition always landed in the same Kafka partition, thus preserving production order.
NewClient Enhancements
The NewClient, a drop‑in replacement for the original Corgi client, adds online rewind capability by reporting consumer IP/port to Redis and invoking Kafka’s seek() on demand, and embeds monitoring for production latency, success rates, and end‑to‑end processing time.
Conclusion
Thorough investigation of the two systems, clear migration boundaries, and a phased approach enabled a successful transition from a self‑built queue to a cloud service, reducing latency and cutting costs while maintaining reliability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Middleware
Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
