How HuoLala Built a Scalable Todo Center to Handle Billions of Requests

To support HuoLala’s massive driver workflow, the team designed a platform‑wide Todo Center that standardizes tasks, optimizes performance, decouples services, and ensures strong and eventual consistency, while employing traffic‑shaping, asynchronous processing, and robust monitoring to sustain billions of daily queries with low latency.

Huolala Tech
Huolala Tech
Huolala Tech
How HuoLala Built a Scalable Todo Center to Handle Billions of Requests

Business Significance of Todo in HuoLala

Todo items are used throughout a driver’s lifecycle—onboarding, certification, order fulfillment, and platform exit—to guide required actions, protect driver rights, and ensure platform compliance.

Todo business significance
Todo business significance

Why Build a Todo Center

Rapid growth increased the number and complexity of Todo scenarios, exposing two core challenges.

Challenge 1: Interface stability and low‑latency requirements

Todo tasks are integrated into pricing and order‑grabbing flows; P99 latency must stay ≤100 ms.

Original architecture invoked >12 downstream services per request, causing P99 latency >2 s and peak QPS >5 k, leading to overload and avalanche risk.

Monitoring was fragmented and lacked a unified view or effective degradation mechanisms.

Challenge 2: Business flexibility and extensibility

Todo system needed to accommodate diverse business needs quickly while reducing integration cost.

Repeated development for similar nodes and performance bottlenecks slowed rollout of new scenarios.

Design Philosophy

The Todo Center is built around three goals:

Standardization : Apply Domain‑Driven Design to extract common attributes and define a unified Todo model with a full lifecycle (register, update, verify, close).

Performance optimization : Combine a local database with caching to eliminate cross‑service latency, achieving millisecond‑level responses and reducing downstream pressure by >90 %.

Decoupling & extensibility : Introduce domain events (e.g., registration success, verification success) for asynchronous notification, enabling rapid response to new requirements without duplicate code.

Key Design

Todo Lifecycle Management

Three management modes cover the entire lifecycle:

Standardized management : Business services call standard Todo APIs for registration, update, verification, and closure.

Asynchronous management : Convert synchronous calls to asynchronous processing when multiple system dependencies exist. The flow is:

Fetch dependent business data.

Build the target Todo set.

Compare with stored Todos and execute register, verify, or update as needed.

Collaborative management : Dynamically combine standard and async capabilities (e.g., async registration + standard verification) to support both new and legacy scenarios without service disruption.

Data Consistency Guarantees

Two mechanisms are provided based on sensitivity and resource considerations:

Strong consistency : In error scenarios force user retry; state transition occurs only after successful handling. Real‑time sync pulls downstream data to keep Todo data strongly consistent. Suitable for high‑sensitivity cases.

Eventual consistency : SDK‑integrated retry and message‑queue‑based re‑delivery; event‑driven or scheduled checks trigger automatic repair. Suitable for tasks tolerant of short delays.

Traffic‑Control Strategy

A three‑layer flow‑reduction mechanism balances peak load and latency:

Scenario control : Configure async handling for low‑traffic scenarios (e.g., login QPS ≤20, app start QPS ≤100) via domain events, reducing peak traffic by ~80 %.

Node filtering : Designate lifecycle nodes (register, verify, update) that can skip async processing, cutting ineffective calls by ~50 %.

Silent‑period control : Define a silent window after a Todo is processed; repeated events within the window are discarded, lowering call frequency by ~60 %.

Elastic Scenario Extension Design

Factory and Template Method patterns enable plug‑and‑play extensions. New Todo business logic is bound to scenes dynamically.

Dynamic registration example:

// Pseudo‑code logic
sceneClass.init() → afterPropertiesSet() → register instance into strategyFactoryMap<SceneType, Handler>

Template method defines a fixed core flow with optional extension points:

protected abstract void validate(); // required extension (e.g., validation)
protected void todoExpireHandle() { /* optional hook */ }

Stability Design

Real‑time monitoring and alerting are built on business‑level metrics (interface success rate, response time) and full‑link tracing (including async tasks). Multi‑level alerts (P0‑P3) enable:

Minute‑level fault detection.

Rapid root‑cause localization via trace analysis.

Automatic fault‑tolerance actions (circuit breaking, service degradation, emergency plans).

Fault‑tolerance mechanisms include:

Circuit breaking : Automatic cut‑off of failing components based on monitoring thresholds.

Service degradation : Global or scenario‑specific fallback strategies.

Emergency response : Pre‑defined playbooks that trigger when critical metrics degrade, ensuring core business continuity.

Production Rollout

The Todo Center has been fully integrated into core quoting and order‑grabbing pipelines, supporting over 50 Todo scenarios. It handles an average of 2 million Todo generations per day and more than 200 million query requests, while providing agile onboarding for new business needs.

Production rollout metrics
Production rollout metrics

Conclusion

Iterative upgrades transformed the driver Todo system from a single‑function module into a comprehensive Todo Center capable of supporting tens of millions of daily interactions. The platform shifted from passive requirement response to proactive business empowerment, significantly improving conversion metrics across the ecosystem.

microservicesscalabilitysystem designTraffic Controlevent-driven
Huolala Tech
Written by

Huolala Tech

Technology reshapes logistics

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.