How We Migrated a Massive Tag System in Two Weeks Without Downtime
This article details a step‑by‑step migration of a content‑community tag system from a monolithic design to separate classification and attribute services, covering storage synchronization, isolation‑layer construction, read/write migration, dependency handling, and final rollout while ensuring speed, stability, and data accuracy.
Background
In a content community, tags are essential for content distribution and understanding, helping deliver the right content to the right users. The existing system mixed classification tags and attribute tags in a single tag repository, causing mis‑tagging and making independent expansion of classifications and attributes impossible.
Current Situation
The community heavily relies on attribute tags not only in its own business systems but also in algorithms, search, and data pipelines. The migration must be fast, stable, and accurate, with minimal impact on upstream services and no user‑visible disruption.
Migration Plan
The migration was broken into five ordered steps based on impact scope: 1) underlying storage synchronization, 2) building an isolation layer, 3) business read/write migration, 4) dependency migration, and 5) enabling the new attribute system. Each step must succeed before the next begins, and any failure can be rolled back to the previous stable state. A data‑verification service runs throughout to ensure consistency.
Storage Migration
We started with full‑volume and incremental synchronization between the old tag system and the new attribute system. Full sync copies all existing tags, while incremental sync mirrors any new tag writes in real time. The synchronization jobs use Alibaba Cloud’s scheduleX distributed task tool. Important considerations include throttling sync speed to avoid overloading dependent services, configuring alerts for anomalies, logging exceptions, and recording progress so that a failed run can resume from the last checkpoint.
After the full sync, an incremental pipeline streams changes from the old system to the new one as soon as they occur.
Isolation Layer Construction
To hide migration details from upstream services, we packaged all read/write calls to the old tag system into a single JAR. This JAR acts as an isolation layer, exposing unified methods that internally decide whether to query the old system or the new attribute system. By routing all tag accesses through this layer, business code remains unchanged while the underlying data source can be switched gradually.
Interface Adaptation
Because classification and attribute tags have different semantics and data models, the content middle‑platform maintains separate domain models and service interfaces for each. To keep upper‑level services unaware of the change, we kept the original service APIs unchanged and performed the necessary adaptation inside the isolation layer.
Business Read/Write Migration
Read Migration
Read migration is relatively safe. We gradually shifted traffic to the new attribute system using percentage‑based routing. If any issue surfaced, traffic could instantly revert to the old system.
Write Migration
Write migration is more complex. Some dependent services (e.g., the content middle‑platform) still write to the old system, leading to potential data divergence. To avoid inconsistency, we established two bidirectional sync pipelines: old → new and new → old. To prevent infinite sync loops, each change carries a source flag; when a message returns to its origin, the update is ignored, breaking the loop.
Dependency Migration
Once the business services are stable, we migrate dependent systems such as algorithms, search, and data pipelines. Real‑time dependencies reuse the existing read migration pipeline, while offline dependencies receive prepared data tables for a coordinated cut‑over.
Enabling the New Attribute System
After a monitoring period confirms no issues, the new attribute system becomes the sole source for tagging. The old‑to‑new sync link is disabled, and later, after further observation, the new‑to‑old link is also cut, completing the migration.
Data Verification Tasks
Throughout the process, a reconciliation job continuously checks that the data in both systems matches. Any discrepancy triggers an alert, prompting investigation and correction.
Summary and Reflections
The entire migration, excluding the time needed for dependent system migration, was completed in two weeks. Results: zero user complaints, zero data differences, and uninterrupted service.
When many services depend on a component, an isolation layer can shield them from migration impact.
Bidirectional sync loops can be avoided by tagging update messages with a source identifier.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
