How We Migrated a Massive Tag System in Two Weeks Without Downtime

This article details a step‑by‑step migration of a content‑community tag system from a monolithic design to separate classification and attribute services, covering storage synchronization, isolation‑layer construction, read/write migration, dependency handling, and final rollout while ensuring speed, stability, and data accuracy.

Xianyu Technology
Xianyu Technology
Xianyu Technology
How We Migrated a Massive Tag System in Two Weeks Without Downtime

Background

In a content community, tags are essential for content distribution and understanding, helping deliver the right content to the right users. The existing system mixed classification tags and attribute tags in a single tag repository, causing mis‑tagging and making independent expansion of classifications and attributes impossible.

Current Situation

The community heavily relies on attribute tags not only in its own business systems but also in algorithms, search, and data pipelines. The migration must be fast, stable, and accurate, with minimal impact on upstream services and no user‑visible disruption.

Migration Plan

The migration was broken into five ordered steps based on impact scope: 1) underlying storage synchronization, 2) building an isolation layer, 3) business read/write migration, 4) dependency migration, and 5) enabling the new attribute system. Each step must succeed before the next begins, and any failure can be rolled back to the previous stable state. A data‑verification service runs throughout to ensure consistency.

Storage Migration

We started with full‑volume and incremental synchronization between the old tag system and the new attribute system. Full sync copies all existing tags, while incremental sync mirrors any new tag writes in real time. The synchronization jobs use Alibaba Cloud’s scheduleX distributed task tool. Important considerations include throttling sync speed to avoid overloading dependent services, configuring alerts for anomalies, logging exceptions, and recording progress so that a failed run can resume from the last checkpoint.

After the full sync, an incremental pipeline streams changes from the old system to the new one as soon as they occur.

Isolation Layer Construction

To hide migration details from upstream services, we packaged all read/write calls to the old tag system into a single JAR. This JAR acts as an isolation layer, exposing unified methods that internally decide whether to query the old system or the new attribute system. By routing all tag accesses through this layer, business code remains unchanged while the underlying data source can be switched gradually.

Interface Adaptation

Because classification and attribute tags have different semantics and data models, the content middle‑platform maintains separate domain models and service interfaces for each. To keep upper‑level services unaware of the change, we kept the original service APIs unchanged and performed the necessary adaptation inside the isolation layer.

Business Read/Write Migration

Read Migration

Read migration is relatively safe. We gradually shifted traffic to the new attribute system using percentage‑based routing. If any issue surfaced, traffic could instantly revert to the old system.

Write Migration

Write migration is more complex. Some dependent services (e.g., the content middle‑platform) still write to the old system, leading to potential data divergence. To avoid inconsistency, we established two bidirectional sync pipelines: old → new and new → old. To prevent infinite sync loops, each change carries a source flag; when a message returns to its origin, the update is ignored, breaking the loop.

Dependency Migration

Once the business services are stable, we migrate dependent systems such as algorithms, search, and data pipelines. Real‑time dependencies reuse the existing read migration pipeline, while offline dependencies receive prepared data tables for a coordinated cut‑over.

Enabling the New Attribute System

After a monitoring period confirms no issues, the new attribute system becomes the sole source for tagging. The old‑to‑new sync link is disabled, and later, after further observation, the new‑to‑old link is also cut, completing the migration.

Data Verification Tasks

Throughout the process, a reconciliation job continuously checks that the data in both systems matches. Any discrepancy triggers an alert, prompting investigation and correction.

Summary and Reflections

The entire migration, excluding the time needed for dependent system migration, was completed in two weeks. Results: zero user complaints, zero data differences, and uninterrupted service.

When many services depend on a component, an isolation layer can shield them from migration impact.

Bidirectional sync loops can be avoided by tagging update messages with a source identifier.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend Architecturesystem migrationdata synchronizationisolation layertag migration
Xianyu Technology
Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.