Millisecond-Scale Multi-Dimensional Data Filtering with HybridDB for MySQL
HybridDB for MySQL delivers millisecond‑scale, multi‑dimensional filtering on billions of rows with hundreds of metrics by combining a high‑performance columnar engine, automatic composite indexes, and a fused MPP‑DAG pipeline, turning half‑day push preparation into seconds while supporting full SQL, spatial, and JSON data.
With the rapid growth of Xianyu, user data has reached the hundred‑million level, containing hundreds of metrics per user. Traditional solutions need minutes or hours to return query results, which is unacceptable for fine‑grained user operations.
The scenario requires (1) arbitrary dimension combination (AND/OR) with nested queries and low latency, (2) data volume of at least billions and scalable growth, and (3) each record to have hundreds of dimensions – a classic OLAP workload.
Comparing OLTP and OLAP shows that row‑store databases (e.g., MySQL) hit performance bottlenecks on large data sets, so a column‑store solution is preferred.
HybridDB for MySQL provides four key capabilities: intelligent composite indexes that require no manual index creation, billion‑row queries returning in milliseconds, full SQL compatibility with the MySQL BI ecosystem, and support for spatial, full‑text, and complex data types (multi‑value columns, JSON).
Millisecond‑level responses are achieved through: (1) a high‑performance columnar engine with predicate push‑down, (2) smart indexing that automatically builds full indexes on wide tables, (3) a fused MPP + DAG engine delivering vectorized pipeline computation, and (4) data‑modeling techniques (star, snowflake, aggregation sorting) that further boost performance.
Using these features, we built a crowd‑selection engine and integrated it with the Xianyu push system to enable rapid, fine‑grained user outreach.
System architecture: an offline layer aggregates user dimensions into a wide table, a real‑time compute layer executes the selection queries, a UI lets operators define arbitrary AND/OR conditions (saved as SQL), and the push system consumes the resulting user IDs.
Example of a saved crowd query:
SELECT COUNT(*) FROM user_big_table WHERE column1 > 1 AND column2 IN ('a','b') AND (column31 = 1 OR column32 = 2);For pagination on tens of millions of rows, we add a batch number (timestamp) and a row number, then create a composite index on (crowd_id, batch_no, row_no) to keep query performance stable even on deep pages.
The redesign reduced push preparation time from half a day per task to seconds, freeing developer resources and improving operational efficiency.
Future work includes moving from T+1 user data to near‑real‑time product data, building a “Mach” real‑time selection engine that reacts instantly to inventory changes.
Xianyu Technology
Official account of the Xianyu technology team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.