Rapid Event Notification System (RENO) at Netflix: Design, Architecture, and Lessons Learned
Netflix built the Rapid Event Notification (RENO) system to deliver real‑time, scalable, and prioritized notifications across millions of devices, handling diverse use cases such as watch activity, personalization updates, plan changes, and diagnostics while managing high request‑per‑second loads through a hybrid push‑pull model and targeted delivery.
Netflix serves over 220 million active members who interact with the service across many devices and platforms. To ensure a seamless experience, Netflix created the Rapid Event Notification (RENO) system, a scalable, extensible service that enables server‑initiated communication with devices for a variety of use cases.
Motivation
Rapid membership growth and increasing system complexity pushed Netflix toward an asynchronous architecture that supports both online and offline computation. Traditional request‑response patterns could not satisfy the need to instantly inform devices about user‑driven changes or experience updates across iOS, Android, smart TVs, Roku, Fire Stick, browsers, and more.
User Scenarios
Watch activity – updating "continue watching" lists on all devices.
Personalized experience refresh – delivering timely recommendation updates.
Membership plan changes – reflecting plan modifications instantly.
My List updates – synchronizing additions or removals across devices.
Profile changes – propagating account‑setting updates.
System diagnostic signals – sending troubleshooting data to apps.
Design Decisions
Single event source – using Netflix's internal Manhattan distributed computation framework as an indirect layer to aggregate events from multiple micro‑services.
Event priority determination – assigning priorities to use cases and routing them to priority‑specific queues and processing clusters.
Hybrid communication model – combining push (immediate server‑initiated notifications) and pull (device‑initiated callbacks) to accommodate always‑online mobile devices and intermittently‑online TV devices.
Targeted delivery – filtering notifications per device type to limit traffic.
High RPS management – employing age‑based filtering, online‑device tracking via Zuul, aggressive auto‑scaling, deduplication, and batched sending to downstream push services (APNS, FCM, etc.).
Single Event Source
Netflix leverages Manhattan’s event management framework to create an indirect layer that serves as RENO’s unified event source, consolidating streams from various internal systems.
Event Priority Determination
Events are categorized by source and importance; higher‑priority events (e.g., profile maturity changes) are processed before lower‑priority ones (e.g., diagnostic signals) by routing them to dedicated queues and clusters.
Hybrid Communication Model
Because mobile devices are usually online while smart TVs are often offline, RENO uses a mixed push‑and‑pull approach: servers push notifications when possible, and devices pull updates during appropriate lifecycle phases, ensuring coverage for devices that cannot receive pushes.
Targeted Delivery
RENO evaluates the target device type for each actionable event and delivers notifications only to eligible devices, dramatically reducing unnecessary traffic.
Managing High RPS
During peak periods RENO handles ~150 k events per second. Optimizations include age‑based filtering to discard stale events, sending only to currently online devices, aggressive scaling policies, deduplication of duplicate events, and parallel batched sending to external push services.
Architecture Diagram
As shown below, RENO consists of several components:
Event Triggers
Member actions and system‑driven updates that require a refreshed experience on devices.
Event Management Engine
Manhattan listens for specific events and forwards them to appropriate queues.
Priority‑Based Queues
AWS SQS queues populated according to priority rules defined in Manhattan.
Priority‑Based Clusters
AWS instance clusters subscribe to queues of matching priority, process events, and generate actionable notifications.
Outbound Messaging System
Netflix’s messaging layer (including a custom Zuul Push solution) delivers notifications to mobile, TV, and other streaming devices.
Persistent Storage
Cassandra stores all notifications per device, allowing devices to poll at their own pace.
Observability
Comprehensive metrics, alerts, and logs (including service‑edge metrics) are collected; Mantis processes real‑time streams to surface device‑level anomalies, and platform‑specific alerts aid rapid root‑cause analysis.
Benefits
Easy support for new use cases.
Horizontal scalability to higher throughput.
Initially focused on personalized experience updates, RENO quickly evolved into a centralized rapid‑notification platform for all Netflix product areas, enabling plug‑and‑play addition of new use cases and supporting mixed delivery across platforms.
Future Outlook
As membership continues to grow, RENO will further enhance the Netflix experience by exploring guaranteed delivery, batch processing, and additional features that reduce communication footprints while opening new use cases.
References
[1] Original article: https://netflixtechblog.com/rapid-event-notification-system-at-netflix-6deb1d2b57d1
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.