How HuoLala Built a Scalable Todo Center to Handle Billions of Requests
To support HuoLala’s massive driver workflow, the team designed a platform‑wide Todo Center that standardizes tasks, optimizes performance, decouples services, and ensures strong and eventual consistency, while employing traffic‑shaping, asynchronous processing, and robust monitoring to sustain billions of daily queries with low latency.
Business Significance of Todo in HuoLala
Todo items are used throughout a driver’s lifecycle—onboarding, certification, order fulfillment, and platform exit—to guide required actions, protect driver rights, and ensure platform compliance.
Why Build a Todo Center
Rapid growth increased the number and complexity of Todo scenarios, exposing two core challenges.
Challenge 1: Interface stability and low‑latency requirements
Todo tasks are integrated into pricing and order‑grabbing flows; P99 latency must stay ≤100 ms.
Original architecture invoked >12 downstream services per request, causing P99 latency >2 s and peak QPS >5 k, leading to overload and avalanche risk.
Monitoring was fragmented and lacked a unified view or effective degradation mechanisms.
Challenge 2: Business flexibility and extensibility
Todo system needed to accommodate diverse business needs quickly while reducing integration cost.
Repeated development for similar nodes and performance bottlenecks slowed rollout of new scenarios.
Design Philosophy
The Todo Center is built around three goals:
Standardization : Apply Domain‑Driven Design to extract common attributes and define a unified Todo model with a full lifecycle (register, update, verify, close).
Performance optimization : Combine a local database with caching to eliminate cross‑service latency, achieving millisecond‑level responses and reducing downstream pressure by >90 %.
Decoupling & extensibility : Introduce domain events (e.g., registration success, verification success) for asynchronous notification, enabling rapid response to new requirements without duplicate code.
Key Design
Todo Lifecycle Management
Three management modes cover the entire lifecycle:
Standardized management : Business services call standard Todo APIs for registration, update, verification, and closure.
Asynchronous management : Convert synchronous calls to asynchronous processing when multiple system dependencies exist. The flow is:
Fetch dependent business data.
Build the target Todo set.
Compare with stored Todos and execute register, verify, or update as needed.
Collaborative management : Dynamically combine standard and async capabilities (e.g., async registration + standard verification) to support both new and legacy scenarios without service disruption.
Data Consistency Guarantees
Two mechanisms are provided based on sensitivity and resource considerations:
Strong consistency : In error scenarios force user retry; state transition occurs only after successful handling. Real‑time sync pulls downstream data to keep Todo data strongly consistent. Suitable for high‑sensitivity cases.
Eventual consistency : SDK‑integrated retry and message‑queue‑based re‑delivery; event‑driven or scheduled checks trigger automatic repair. Suitable for tasks tolerant of short delays.
Traffic‑Control Strategy
A three‑layer flow‑reduction mechanism balances peak load and latency:
Scenario control : Configure async handling for low‑traffic scenarios (e.g., login QPS ≤20, app start QPS ≤100) via domain events, reducing peak traffic by ~80 %.
Node filtering : Designate lifecycle nodes (register, verify, update) that can skip async processing, cutting ineffective calls by ~50 %.
Silent‑period control : Define a silent window after a Todo is processed; repeated events within the window are discarded, lowering call frequency by ~60 %.
Elastic Scenario Extension Design
Factory and Template Method patterns enable plug‑and‑play extensions. New Todo business logic is bound to scenes dynamically.
Dynamic registration example:
// Pseudo‑code logic
sceneClass.init() → afterPropertiesSet() → register instance into strategyFactoryMap<SceneType, Handler>Template method defines a fixed core flow with optional extension points:
protected abstract void validate(); // required extension (e.g., validation)
protected void todoExpireHandle() { /* optional hook */ }Stability Design
Real‑time monitoring and alerting are built on business‑level metrics (interface success rate, response time) and full‑link tracing (including async tasks). Multi‑level alerts (P0‑P3) enable:
Minute‑level fault detection.
Rapid root‑cause localization via trace analysis.
Automatic fault‑tolerance actions (circuit breaking, service degradation, emergency plans).
Fault‑tolerance mechanisms include:
Circuit breaking : Automatic cut‑off of failing components based on monitoring thresholds.
Service degradation : Global or scenario‑specific fallback strategies.
Emergency response : Pre‑defined playbooks that trigger when critical metrics degrade, ensuring core business continuity.
Production Rollout
The Todo Center has been fully integrated into core quoting and order‑grabbing pipelines, supporting over 50 Todo scenarios. It handles an average of 2 million Todo generations per day and more than 200 million query requests, while providing agile onboarding for new business needs.
Conclusion
Iterative upgrades transformed the driver Todo system from a single‑function module into a comprehensive Todo Center capable of supporting tens of millions of daily interactions. The platform shifted from passive requirement response to proactive business empowerment, significantly improving conversion metrics across the ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
