How Youku Scales User Reach: Inside the Architecture of a Billion‑User Messaging Platform
This talk reveals how Youku built a flexible, universal user‑reach platform that leverages graph‑based task modeling, dynamic expressions, and a unified execution engine to deliver recall and activation campaigns to over a billion users with fast, precise targeting and experiment‑driven optimization.
Background
In user‑operation scenarios, Youku accumulated many marketing and recall strategies. To deploy them quickly they built a flexible, universal reach‑configuration platform that supports recall, activation, strategy linkage, experiment comparison and optimization.
Architecture Overview
The platform consists of three core technical pillars:
Graph‑based task construction : Each reach task is represented as a node in a directed graph; edges define dependencies and flow. This model enables complex workflow definitions, reuse of sub‑tasks, and runtime evaluation of expressions attached to nodes.
Dynamic expression engine : A lightweight expression language is evaluated at execution time. Expressions can reference user attributes, real‑time metrics, or external feature flags, allowing personalized messaging without code redeployment.
Unified execution engine : A single runtime processes all task types (push, email, SMS, in‑app, etc.). It traverses the task graph, resolves expressions, and dispatches messages through a high‑throughput pipeline, guaranteeing consistent handling across billions of users.
Scalability and Performance
The system is built to handle batch sizes of >100 million users with sub‑second latency. Key techniques include:
Sharding of user IDs and task state across a distributed key‑value store.
Stateless worker nodes that pull task slices from a message queue (e.g., Kafka) and execute them in parallel.
Batch‑level deduplication and rate‑limiting to protect downstream channels.
Metrics collected per batch include total users processed, success rate, average dispatch latency, and per‑channel throttling statistics, enabling rapid A/B testing and iterative optimization.
Operational Workflow
Typical steps to launch a reach campaign:
Define the campaign in the configuration UI or via JSON/YAML. The definition includes a graph of tasks, expression strings, target audience filters, and channel parameters.
Submit the definition to the platform’s API; the platform validates the graph for cycles and missing dependencies.
The unified execution engine schedules the campaign, partitions the target audience, and starts workers.
During execution, the expression engine evaluates per‑user conditions (e.g., user.age > 18 && user.lastLoginDays < 7) to decide whether to send a message.
Results are streamed to a monitoring dashboard where success/failure ratios and latency are displayed.
Key Benefits
Rapid iteration: new strategies can be deployed by updating graph definitions or expressions without code changes.
High throughput: supports billions of users per day with low latency.
Fine‑grained control: dynamic expressions enable per‑user personalization.
Unified monitoring: a single engine provides consistent metrics across all channels.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
