Databases 16 min read

Data Consistency Verification Practices and Implementation at Xiaohongshu

Xiaohongshu built a lock‑free, non‑disruptive data‑consistency verification tool that automatically selects optimal methods, handles heterogeneous sources and dynamic changes, performs full and incremental checks via chunked checksums or row‑by‑row comparison, quickly isolates mismatches, and supports automatic remediation, ensuring reliable migrations and sharding.

Xiaohongshu Tech REDtech

Dec 19, 2024

Data Consistency Verification Practices and Implementation at Xiaohongshu

This article introduces how Xiaohongshu applies data consistency verification to its business and the concrete benefits obtained.

1.1 What is Data Consistency Verification

In scenarios such as data migration, synchronization, and multi‑data‑center deployment, strict data consistency is required. Errors such as mis‑writes, lost writes, replication lag, dirty reads, double‑writes, and human mistakes can cause inconsistencies.

Building a data‑consistency verification capability enables timely and accurate detection and resolution of inconsistencies, reducing business impact.

1.2 Required Capabilities

Handle continuously changing data volume and content without false alarms.

Perform lock‑free, non‑disruptive verification while services remain online.

Control impact on database performance despite high‑concurrency reads.

Adapt to heterogeneous data sources.

Deal with uneven data distribution between source and target.

Quickly locate inconsistent records.

Provide automatic correction scripts for fast remediation.

2. Business Scenarios

MySQL cluster capacity growth, table migration, and heterogeneous data‑source migration require reliable data migration. Existing industry solutions do not fully meet Xiaohongshu’s complex needs, prompting the development of a custom verification tool.

3. Features of the Verification Tool

Supports mismatched data distribution (single table, sharding, etc.).

Automatically selects the optimal verification method based on schema and data distribution.

Adapts to dynamic data changes.

Performs non‑interruptive, lock‑free verification.

Configurable parameters for speed and batch size.

Fast identification of inconsistent content.

Custom column mapping and transformation rules.

3.1 Verification Types

Two main types: full‑data verification and incremental verification.

Full verification checks all data at a point in time. It is usually run periodically and can be homogeneous (same schema) or heterogeneous (different schema).

Incremental verification validates only newly changed data based on change events, improving real‑time coverage but not historical data.

3.2 Implementation Architecture

The system is built on a real‑time data‑transfer service and abstracts three components:

Reader : Retrieves data from the source. Selector extracts full data; Replicator parses binlog for incremental data.

Processor : Applies user‑defined column mapping or transformation before comparison.

Writer : Writes verification results, updates checkpoints, stores digests, and persists inconsistent records.

3.3 Full‑Data Verification

Full verification runs after the full‑sync task completes and the incremental lag is caught up. Data is processed in chunks; each chunk’s checksum (e.g., CRC32) is compared between source and target. If a chunk mismatches, a binary‑search split narrows the inconsistent range until the exact rows are identified. This chunk‑based approach improves efficiency and limits impact on the production database.

Homogeneous verification uses checksum; heterogeneous verification falls back to row‑by‑row comparison with configurable retry policies.

3.4 Incremental Verification

Incremental verification monitors source binlog events, retrieves the corresponding target rows via primary or unique keys, and compares them. To handle latency and frequent changes, the system includes delayed‑point checks and re‑verification mechanisms to ensure current consistency.

4. Summary and Outlook

Since its launch, the Xiaohongshu MySQL data‑consistency verification tool has been successfully applied to migration, sharding, and other critical scenarios, providing strong data‑integrity guarantees.

Future work includes enhancing product maturity, expanding supported data sources, extending to data lake/warehouse, cache updates, and providing one‑click repair SQL generation, as well as building a data‑quality dashboard for root‑cause analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems Data Consistency mysql database migration data validation

Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.