Millisecond-Level Counting for Billion-Scale Data via Offline Batch and Online Incremental Statistics
To achieve millisecond‑level counting on billion‑scale data, the Xianyu team replaced slow MySQL count queries with an offline batch that snapshots relational tables and computes totals, then uses KV‑store incremental statistics for online updates, delivering sub‑10 ms responses with near‑100 % success.
Relational databases become inefficient for count queries on billion‑scale data; the Xianyu team needed millisecond‑level counting.
Traditional MySQL count operations cannot meet online service requirements due to high latency.
The proposed design replaces costly count queries with an offline batch processing step combined with online incremental statistics stored in a KV store, achieving sub‑10 ms response and near‑100 % success rate.
Offline batch copies all relational data to an offline store (e.g., ODPS) at a snapshot time, computes total counts per source, and records the latest modification timestamp (offlineTotal).
Because sharded tables have inconsistent snapshot times, the solution uses the batch start time as the snapshot reference, avoiding data pollution.
Online, incremental data records daily total increments (dailyIncrTotal) and per‑event increments (modifiedTimeIncr). The final count is calculated as offlineTotal + ΣdailyIncrTotal – overlap, using the latest modification time to subtract duplicated increments.
This approach reduces a single count request to a few KV reads, delivering real‑time performance while retaining the ability to re‑run offline jobs for correction.
The technique demonstrates how offline‑online hybrid processing can solve high‑throughput counting problems in big‑data environments.
Xianyu Technology
Official account of the Xianyu technology team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.