Design and Implementation of Meituan Hotel Real-Time Operation Reach System
The article describes Meituan’s hotel real‑time reach platform, which replaces numerous hard‑coded Storm topologies with a unified Storm‑Aviator rule engine supporting time‑window and delayed triggers, offering configurable scenes, custom functions, monitoring, and alerting, and now processes nearly a billion daily events with improved conversion and scalability.
Background: Meituan Dianping’s hotel and travel operations need real‑time user behavior data (e.g., browsing, ordering, refund, search) to achieve T+1 reach. The T+1 delay limits timely interaction, so the business requires a system that can process events instantly and trigger pushes or other actions.
Typical business scenarios include:
User performs behavior A three or more times within 30 minutes. User is a repeat hotel customer (has purchased a Meituan hotel product before). User does not perform behavior B within 24 hours before behavior A. User does not perform behavior B within 30 minutes after behavior A (to exclude self‑generated B events).
The article uses the first scenario as an example to discuss system design.
Early solution: Each real‑time reach request was handled by a dedicated Storm topology with hard‑coded rules (see Figure 1). This case‑by‑case approach violated the DRY principle, caused linear growth in maintenance cost, and eventually could not support the increasing number of activities.
Challenges identified:
Hard‑coded rules lead to massive duplicate code and high development cost.
Modifying business rules requires code changes and topology restarts.
Numerous Storm topologies result in low resource utilization and high maintenance overhead.
Lack of a comprehensive monitoring and alerting mechanism makes early detection of stability issues difficult.
Technical research highlighted the need for a rule engine to decouple business logic from system code. Rule engines are often combined with Complex Event Processing (CEP) to handle event dependencies and time‑based conditions.
Two open‑source rule engines were evaluated:
Esper : lightweight CEP solution, SQL‑like EPL syntax, but limited to in‑memory single‑node deployment and cannot handle long time windows or scheduled triggers.
Drools : feature‑rich with monitoring and UI, but steep learning curve, DRL language complexity, and similar in‑memory time‑window limitations.
Because the business requires time‑window and scheduled‑reach capabilities, the team selected the lightweight expression engine Aviator (a Google‑archived project) and integrated it with Storm. Aviator handles stream processing while Storm guarantees throughput.
System architecture (Figure 2) consists of the following modules:
Rule Engine (embedded in Storm topology)
Time‑Window Module (sliding windows with configurable span)
Timed Reach Module (executes rules after a specified delay)
Custom Functions (extensions on top of Aviator)
Alert Module (monitors message volume and sends alerts)
Rule Configuration Console (UI for adding scenes and rules)
Configuration Loader (periodically loads rule definitions)
Core components of the rule engine include scenes, rules, rule conditions, factors (basic, time‑window, third‑party), rule responses, and events (synchronous vs. asynchronous). The time‑window module provides factors such as count(timeWindow(event.id, event.userId, X * 60)), first(timeWindow(event.id, event.userId, X * 60)), etc., stored in Meituan’s KV store Cellar, achieving ~2 ms 99th‑percentile latency at 20 K QPS.
Custom functions (see Table 2) extend Aviator, e.g., equals(message.orderType, 0), filter(browseList, 'source', 'dp'), userPortrait(message.userId), and userBlackList(message.userId). These functions enable integration with user‑profile services and blacklist checks.
The timed‑reach module originally used an in‑memory DelayQueue, but as activity volume grew, it switched to Meituan’s Mafka message queue which supports delayed messages, ensuring durability across restarts.
Monitoring and alerting: Real‑time event metrics are reported to Meituan’s data platform and visualized (Figure 6). An alerting service queries OpenTSDB via HTTP API, applies thresholds and ratio checks, and sends IM notifications when anomalies are detected (Figure 7).
Summary and outlook: The system has been stable for over a year, processing nearly 1 billion real‑time messages daily, with peak QPS of 14 K, and has significantly improved conversion, GMV, and user acquisition. Remaining pain points include non‑automated data ingestion, limited rule‑engine generalization, and lack of self‑service rule registration for product managers. Future work will focus on further simplifying the platform and expanding its applicability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
