Achieving Sub‑Second Real‑Time Product Selection with Xianyu’s Mach and Blink
Xianyu’s Mach system tackles the e‑commerce challenge of instantly selecting high‑quality items from billions of products by leveraging Blink’s low‑latency stream computing, detailing its architecture—including state, windows, custom UDX functions, data merging, rule execution, and SQL‑to‑MVEL conversion—to achieve sub‑second processing at massive scale.
Mach System Overview
Mach is a real‑time, high‑performance product selection engine used by Xianyu. It can apply millions of rules to billions of products, marking items within 10 minutes for a 1 million‑rule scale and synchronizing rule hits within one second after a product change.
Stream Computing Fundamentals
Stream computing processes events continuously with low latency. Data is partitioned into windows (tumbling or hopping) and state is maintained across events, allowing immediate output without waiting for batch completion.
Mach uses Blink, Alibaba’s enhanced version of Apache Flink, as its stream‑computing engine. Blink provides high throughput, low latency, stateful computation, strong consistency, and advanced event‑time handling.
Blink State
State stores intermediate results (e.g., aggregation buffers, Kafka offsets). In Mach, state holds merged product data and rule execution outcomes. When a product changes, the new data is merged with the state, rules are re‑executed, and the diff between new and old results yields the effective output.
Blink Window
Windows group data by time. Tumbling windows have a fixed size and trigger once per period (e.g., per minute). Hopping windows have a fixed size but slide more frequently, causing overlapping windows and multiple assignments of the same event.
Blink UDX (User‑Defined Extensions)
UDF : scalar function, one input field, one output field.
UDTF : table‑valued function, one or more input columns, multiple output rows (e.g., STRINGSPLIT, JSONTUPLE).
UDAF : aggregate function, multiple input rows, single aggregated output (e.g., MAX, MIN, AVG).
Mach relies heavily on UDAFs for merging product data, executing rules, and diffing results.
Second‑Level Selection Scheme
Two technical proposals were evaluated:
PostgreSQL‑based solution using functions and triggers. It performed well on small datasets but degraded sharply with millions of rules and billions of products, failing the sub‑second requirement.
Blink‑based solution expressed data‑processing logic in SQL‑like statements and delivered superior performance. This option was selected for production.
Data Processing Module Architecture
Data Ingestion Layer
Connects to multiple channels and data sources.
Parses and validates incoming messages.
Aggregates channel‑wise data volume and monitors trends.
Retrieves field‑level metadata from a central metadata service.
Performs field‑level validation based on metadata.
Normalizes data into Mach’s standard schema.
This decouples source heterogeneity from Blink, allowing a single Kafka topic to feed all data.
Data Merging Layer
Combines the latest product update with the in‑memory snapshot, using timestamps to keep the most recent field values.
Example merge record format: {key:[timestamp,value]} Merge algorithm (illustrated in the diagram) selects the value with the greatest timestamp for each field.
Rule Execution Layer
Consumes merged product data.
Applies field‑level metadata for parsing.
Fetches active rules from the rule center.
Iterates over rules, evaluates each against the product, and stores boolean results in state.
Records exceptions and triggers alerts.
Typical rule: price > 50 AND status == "online". With ~300 rules (growing), each product change may trigger thousands of evaluations, requiring Blink’s massive parallelism.
Result Processing Layer
Retrieves rule execution results for a product.
Filters results based on hit/miss status.
Diffs current results against the previous snapshot stored in Blink state to emit only changed outcomes.
Example diff:
Previous: {"rule1":true,"rule2":true,"rule3":false,"rule4":false}
Current: {"rule1":true,"rule2":false,"rule3":false,"rule4":true}
Effective output: {"rule2":false,"rule4":true}Converting Offline SQL Rules to Online MVEL Expressions
Offline rules are stored as SQL WHERE clauses and executed in a relational database. Real‑time rules run as MVEL expressions (Java‑like syntax) inside Blink. The conversion must preserve semantics and performance.
Solution steps:
Parse SQL using Alibaba Druid to obtain an abstract syntax tree and extract the WHERE subtree.
Traverse the subtree in post‑order.
Replace SQL operators with Java equivalents (e.g., AND → &&, OR → ||, = → ==, <> → !=, LIKE → .contains()).
Transform IN clauses into a series of || comparisons.
Example conversion: A LIKE '%test%' becomes A.contains("test").
Performance and Impact
Since launch, Mach has supported nearly 400 campaigns, processing ~140 million messages daily with peak throughput of 50 000 TPS. The design is generic: input and output are MQ messages, making the solution applicable to other real‑time filtering scenarios beyond product selection.
References
Xianyu Real‑Time Selection System: https://mp.weixin.qq.com/s/8ROsZniYD7nIQssC14mn3w
Blink (Flink): https://github.com/apache/flink/tree/blink
PostgreSQL: https://www.postgresql.org/
Druid SQL Parser: https://github.com/alibaba/druid
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
