How to Build a Scalable Real‑Time & Offline Data Consistency Platform for Billions of Records
This article details the design and implementation of a high‑performance data‑consistency verification platform that supports both real‑time and 10‑minute offline checks on tens of billions of records, covering background challenges, system architecture, processing pipelines, RocksDB integration, performance results, and practical lessons learned.
Background and Challenges
Large‑scale business systems often lack a lightweight tool that can instantly detect data inconsistencies and provide real‑time validation. Manual cross‑system checks and human error make timely detection difficult.
Platform Overview
The Business Consistency Verification Platform supports both offline and real‑time data checks with a focus on simplicity of integration.
Real‑Time Data Verification System Design
Design Goals
Low intrusion: easy integration, minimal code changes, negligible impact on existing services.
Low latency: data reporting delay kept at the second level.
High throughput: able to satisfy peak‑hour verification demand.
Real‑time alerting: highly sensitive anomaly detection.
Proactive circuit‑break: immediate fault signals sent to business platforms.
Tenant isolation: prevent cross‑tenant data interference.
Processing Stages
Upstream data is collected by a Flume agent, sent via MQ, and stored in a Redis ZSET (member = message, score = timestamp).
Downstream data follows the same path; it first attempts a direct match in the upstream cache, then falls back to a ZSCAN based on a business‑defined unique key.
Based on the match result, the system emits one of three outcomes: "verification passed", "suspected error", or "confirmed error".
According to error‑alert rules, the system decides whether to notify the user.
Exception Handling Scenarios
Case 1 – Downstream arrives before upstream: When timestamps are close, a synchronized lock iterates the ZSET to avoid massive ZSCAN operations that would spike Redis CPU.
Case 2 – Upstream consumption faster than downstream: The upstream Redis queue may overflow; the solution is to throttle the Flume agent or pause upstream Kafka consumption.
Case 3 – Expiration removal: Stale entries that never match are periodically evicted, persisted to an error table, and later re‑verified.
Performance
Throughput remains stable under peak load.
With 1 million upstream records buffered, downstream verification completes in ~2 seconds for a clean run and ~4 seconds when errors are present.
10‑Minute Million‑Scale Offline Verification Design
Two core algorithms were evaluated:
Chunked checksum: Fast when data is identical but suffers from unstable performance and requires a reliable order‑independent checksum.
Merge comparison: Slightly slower but offers stable performance, simpler implementation, and requires sorting (O(N log N)) followed by linear comparison (O(N)).
Considering implementation difficulty and time stability, the merge‑comparison approach was selected.
Offline Verification 2.0
Goals
Break the limitation of specific data‑source types.
Minimize impact on source systems.
High‑Level Approach
Data is first pulled to a local node, sorted locally, and then merged‑compared using the ordered data. This eliminates the need for source‑side ORDER BY and reduces memory pressure on business databases.
Key Improvements
Plugin‑based query interface to support custom DataX stream readers.
Standardized workflow: data extraction → field‑rule conversion → key‑value construction.
Local sorting replaces source‑side sorting, leveraging RocksDB’s ordered SST files for two‑way merge.
RocksDB Overview and Usage in 2.0
Characteristics
Key‑Value store with default ascending key order.
Column families provide data isolation.
Supports GET/ITERATOR/PUT/DELETE operations and can store terabytes of data.
Design Details for Verification
Each node runs a shared RocksDB instance; different tasks use separate column families for isolation.
Source and target data are stored in distinct column families named {taskId}_0 and {taskId}_1.
Rows are transformed into KV pairs, e.g., key = k1,k2,… ; value = o1,o2,….
Duplicate‑key records are stored in a dedicated column family with an auto‑increment suffix to avoid collisions.
An in‑memory Bloom filter quickly discards definitely non‑duplicate keys; suspected duplicates are re‑checked in RocksDB.
RocksDB Optimizations
WAL disabled to reduce disk I/O and CPU overhead.
Custom application‑level Bloom filter replaces the built‑in one for higher accuracy.
Maximum write cache limited to avoid off‑heap memory overflow.
50 Billion Data Sync Experience
Background
~50 billion rows after joining two tables, ~70 fields to migrate.
Daily incremental volume: 5–15 million rows.
Source database: Oracle.
Synchronization Plan
Full‑load migration with recorded start (T1) and end (T2) timestamps.
After full load, perform incremental migration using update_time as the filter, scheduled every 10 minutes (window: t‑15 min to t‑5 min).
Run data‑consistency checks after each load.
Task Partitioning
Use business date budat (yyyyMMdd) to split the full load into yearly jobs (2005‑2021, 7 jobs) and then into daily tasks (~356 per job).
Incremental jobs query by update_time and write to a 10‑minute window.
Key Lessons
Divide‑and‑conquer: choose a field that balances data distribution and further split until each task’s runtime is acceptable.
Process order: full load first, then incremental, to avoid version conflicts.
Incremental window should start a few minutes earlier than the full‑load window and end before the current time to avoid missing data due to source lag.
Regular consistency checks are essential to guarantee data integrity.
Q&A Highlights
Multi‑tenant resource isolation: Real‑time queues are tenant‑specific; offline jobs share resources with a custom load‑balancing algorithm. Yarn integration is under evaluation.
Real‑time consistency for MySQL→MongoDB sync: Not recommended for hot data; filter stable status records (e.g., SUCCESS, FAILED) and use offline verification for heterogeneous sources.
Impact on production databases: Each verification creates a single task; scheduling is user‑controlled, so it does not overload the source.
Handling long‑missing data on one side: Triggers expiration policy, marks as "suspected error", persists, and later re‑verifies.
RocksDB memory usage: Local sorting and merge comparison rely on RocksDB’s LSM ordering, not on large in‑memory structures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
