How We Migrated Our Self‑Built Message Queue to Tencent Cloud CKafka

This article details why the e‑commerce platform built its own Corgi message queue, the operational and cost drawbacks that prompted a move to Tencent Cloud CKafka, and the three‑phase migration strategy—including dual‑write, cut‑read, and cut‑write—while preserving message safety and low latency.

Tencent Cloud Middleware
Tencent Cloud Middleware
Tencent Cloud Middleware
How We Migrated Our Self‑Built Message Queue to Tencent Cloud CKafka

Background

When a company needs a message‑queue middleware, the typical choices are cloud‑provider services or self‑hosted open‑source solutions such as Kafka or RabbitMQ. In 2015, Mushroom Street (蘑菇街) chose to build its own queue, Corgi, because of data‑security concerns and the need for many topics and partitions, which made public cloud latency unacceptable.

Why Migrate

By 2020, cloud‑based queues became mature and cost‑effective. Maintaining Corgi incurred two major disadvantages: operational pain points (manual master‑slave failover and expansion) and high server costs due to a three‑node master‑slave architecture and high CPU requirements.

Migration Requirements

The migration had to guarantee no message loss (Corgi’s AtLeastOnce semantics), preserve partial ordering for ordered consumers, and avoid any service downtime.

Migration Plan

The process was divided into three stages: dual‑write, cut‑read, and cut‑write.

Dual‑Write Phase

Two approaches were considered: modifying the producer to write both Corgi and Kafka, or using a MirrorMaker to sync messages. The team selected MirrorMaker, which consumes from Corgi and produces to Kafka while ensuring atomic commits, preserving order and handling failures gracefully.

Cut‑Read Phase

Consumers were split into non‑ordered and ordered groups. Non‑ordered consumers required no special handling—business logic ensured idempotency. Ordered consumers used a pre‑created early offset in Kafka and a switch in the NewClient to pause consumption until all instances were upgraded, then resumed from the prepared offset to maintain order.

Cut‑Write Phase

After all consumers migrated and MirrorMaker was the sole reader of Corgi, producers were switched. The NewClient’s partition selector mimicked MirrorMaker’s logic, ensuring that messages from the same Corgi partition always landed in the same Kafka partition, thus preserving production order.

NewClient Enhancements

The NewClient, a drop‑in replacement for the original Corgi client, adds online rewind capability by reporting consumer IP/port to Redis and invoking Kafka’s seek() on demand, and embeds monitoring for production latency, success rates, and end‑to‑end processing time.

Conclusion

Thorough investigation of the two systems, clear migration boundaries, and a phased approach enabled a successful transition from a self‑built queue to a cloud service, reducing latency and cutting costs while maintaining reliability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KafkaCKafka
Tencent Cloud Middleware
Written by

Tencent Cloud Middleware

Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.