Snapdeal Ads System Architecture: Scaling to 5 Billion Daily Requests
The article details how Snapdeal built a high‑performance, low‑latency ad‑serving platform that handles billions of daily requests by employing horizontal and vertical scaling, AP‑focused CAP choices, in‑memory data structures, and a suite of open‑source backend technologies.
Snapdeal, one of India’s largest e‑commerce platforms, designed the Snapdeal Ads system to process up to five billion requests per day; the case study explains how a team of fewer than ten engineers achieved web‑scale performance by selecting the right technologies.
Key Strategies : scale both horizontally and vertically; prioritize availability and partition tolerance (AP) over strict consistency; avoid vendor‑locked proprietary software; apply mechanical sympathy to maximize hardware efficiency; enforce sub‑millisecond query latency using RocksDB and in‑memory caches; use SSDs, large RAM, and avoid virtualization; batch disk writes; fine‑tune Nginx and Netty; keep critical data in‑memory; adopt a share‑nothing architecture; ensure data replication and tolerate stale or inconsistent data; build a fault‑tolerant messaging system that never loses messages.
Current Infrastructure : 40‑50 nodes across three data centers, including 30 high‑compute machines (128‑256 GB RAM, 24 cores, SSD) and smaller 32 GB RAM quad‑core servers; 10 Gb private and public networks; small Cassandra, HBase, and Spark clusters.
Critical Requirements : support RTB 2.0 HTTP requests from multiple bidders, deliver Yes/No price decisions, handle hundreds of thousands of QPS and billions of daily events, and keep key data instantly available.
Key Technologies : HBase and Cassandra for storage (high write throughput), Java backend (with legacy C++/Erlang experience), Google Protobuf for serialization, Netty for high‑performance networking, RocksDB with Kafka for embedded storage sync, Kafka for streaming, CQEngine for fast in‑memory queries, Nginx as reverse proxy, Spark for machine‑learning tasks, Jenkins for CI, Nagios/New Relic for monitoring, Zookeeper for coordination, Bittorrent Sync for cross‑node data replication, and a custom quota manager based on a Yahoo white paper.
System Design and Results : the ad server uses non‑blocking Netty to process HTTP requests, performs all lookups in‑memory via CQEngine, achieving 5‑15 ms latency; results are written asynchronously to Kafka, consumed by HBase, while budget and campaign state are updated in Cassandra; Spark handles ad‑hoc analytics.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.