How Meituan Built Zeus: Inside a Scalable Security Rule Engine
This article examines Meituan's custom rule engine Zeus, detailing the security challenges of a massive multi‑service platform, the architectural decisions made to decouple risk logic, the implementation of reusable factors and rule groups, and the ongoing push toward automated, intelligent risk mitigation.
Background
Meituan's apps face a wide range of fraud, account theft, cheating, cash‑out abuse, and malicious activity that threaten both users and merchants. The security team must identify who performed an action, when, how, and what was done, then quickly validate and deploy the corresponding risk policies.
Early on, risk logic was hard‑coded within each business service. As Meituan expanded from group‑buying to food delivery, hotel booking, travel, finance, and more, this approach became unmanageable due to scattered policies, tight coupling with business code, slow iteration, and high integration cost.
Challenges and Solutions
1. High Business Diversity and Integration Cost
With hundreds of vertical services and multiple user roles (users, merchants, suppliers, channels), a single‑point integration required each new business or risk scenario to re‑engineer the interface between the service and the rule engine.
Solution: Define common entry points (e.g., user center, merchant center, order center, checkout) that invoke the rule engine, allowing any business to integrate by calling these shared nodes.
2. Numerous Risk Points and Complex Logic Reuse
Risk scenarios span cheating, fake orders, account takeover, and payment fraud, requiring expressive yet maintainable logic.
Solution: Introduce reusable factor templates:
Extension functions for data extraction and format conversion.
Accumulation factors for counting events (e.g., IP‑UserID login frequency) with configurable windows, sum, recent‑N, etc.
Decision‑table factors to compactly represent multi‑condition, multi‑action rules.
Blacklist factors for seamless integration with name‑list services.
Tool factors that package groups of rules and output a score for cross‑scenario reuse.
To avoid duplicating identical rules across dozens of scenarios, the engine adopts a "rule group" concept, clustering related rules (e.g., crowdsourcing detection, fake device detection) for centralized management and selective application.
3. Rapid Risk Evolution and Verification Speed
When external threats evolve, the engine must quickly adapt. Zeus supports three verification modes:
Mark : Gray‑run a rule without returning a decision, useful for monitoring.
Dual‑run : Execute both new and existing rules simultaneously during a rollout.
Backtrack : Replay historical traffic against a rule to assess impact.
These modes reduce deployment risk and shorten the time from policy creation to production.
Thought Summary
Zeus evolved from a simple expression service to a configurable platform that separates execution and calculation layers. The execution layer selects rule sets based on scenes and returns decisions; the calculation layer computes factor values.
Efficiency improvements target different roles:
Risk users : Integrate with blacklist services to block repeat offenders.
Business teams : Provide a unified entry point and on‑demand data fetching to lower integration effort.
Product managers : Offer rule groups, factor tools, and real‑time analytics for faster policy iteration.
Engineers : Encapsulate common logic into reusable factors, freeing development resources.
Algorithm engineers : Wrap model outputs as data interfaces, enabling rapid model‑to‑engine handoff.
Future Development and Reflections
Zeus is moving from a configuration‑driven stage toward automation and intelligence, aiming to accelerate policy lifecycle and support more sophisticated AI‑based risk detection.
Remaining challenges include handling long‑term features (e.g., yearly trends) that require offline‑online hybrid processing, reducing integration overhead for simple use cases, and scaling stability as traffic and policy volume continue to grow.
Pitfalls Encountered
1. Achieving low coupling in a highly aggregated product architecture
Complex configurations across routing, scenes, rule groups, and data interfaces can create tight coupling between layers. Optimizations such as caching the configuration layer and incremental updates helped decouple execution and calculation layers.
2. Balancing system complexity with diverse business needs
Custom features often appear as unique business requirements but may be anticipatable. A three‑step approach—Define, Judge, Gap—helps classify needs and decide whether to build a generic module or a specialized extension.
3. Designing "fail‑safe" mechanisms
Preventing human error through constraints such as peak‑time lock‑downs, mandatory validation before rollout, and automated testing of rule logic reduces accidental mis‑configurations.
4. Unexpected benefits from product usage
Rule groups originally intended for management also enabled on‑the‑fly decision calculations. Accumulation factors proved effective for cross‑event risk detection in multiple business lines.
Regular retrospectives and monitoring are essential to maintain the health of the platform.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
