How Tumblr Scaled to 5 Billion Page Views: Inside Their Distributed Architecture
This article examines how Tumblr handled rapid growth—processing 5 billion daily page views, 40 k requests per second, and terabytes of data—by evolving from a LAMP stack to a Scala‑based, Finagle‑driven distributed system with HBase, Redis, Kafka, and a cell architecture that supports massive real‑time dashboards.
Key Data
• 5 billion page views per day • Over 150 billion page views per month • Peak traffic of nearly 40 k requests per second • 1 TB of new data ingested daily into Hadoop • Over 1 TB of MySQL/HBase/Redis/memcache data generated daily • 30% month‑over‑month growth • Approximately 1 000 production servers • About 20 engineers supporting the platform
Software Environment
Development on OS X, production on Linux (CentOS/Scientific). Stack includes Apache, PHP, Scala, Ruby, MySQL, HBase, Redis, Varnish, HAProxy, nginx, memcache, Gearman, Kafka, Kestrel, Finagle, Thrift, HTTP, Func (remote‑control framework), Git, Capistrano, Puppet, Jenkins.
Hardware Environment
500 web servers, 200 database servers (47 pools, 20 shards), 30 memcache servers, 22 Redis servers, 15 Varnish servers, 25 HAProxy nodes, 8 nginx servers, 14 work‑queue servers (Kestrel + Gearman).
Architecture
Tumblr’s usage pattern differs from typical social sites: tens of millions of posts daily, each with hundreds of comments, and users typically have only a few hundred followers. The Dashboard receives the majority of traffic, requiring real‑time, consistent updates.
Old Architecture
Initially a classic LAMP deployment on Rackspace with a three‑server setup (web, database, PHP). Scaling introduced memcache, front‑end caching, HAProxy, and MySQL sharding. Custom C services were built for ID generation and Dashboard notifications (Staircar).
New Architecture
The system migrated to a JVM‑centric model, replacing PHP with services written in Scala and using Finagle for RPC, service discovery, and tracing. The stack now includes Scala, Finagle, HBase, Redis, Kafka, and Thrift. The migration enables better scalability and leverages the extensive Java ecosystem.
Internal Firehose
A high‑throughput messaging pipeline (firehose) transports user actions (posts, likes, etc.) using LinkedIn’s Kafka for storage and Thrift/HTTP for communication. The firehose retains a week of data and supports multiple client IDs without duplication.
Cell Architecture for Dashboard Inbox
Dashboard data is stored in independent “cells.” Each cell contains its own HBase cluster, service cluster, and Redis cache, handling a subset of users. Cells process firehose events, write posts to databases, and update inboxes. This design provides parallelism, fault isolation, and easy scaling.
Launching in New York
New York offers abundant funding and talent, but hiring remains challenging for a fast‑growing startup.
Team Structure
Teams include Infrastructure, Platform, SRE, Product, Web Ops, and Services. Responsibilities span IP/DNS, hardware provisioning, core application development, SQL sharding, reliability engineering, and strategic service development.
Software Deployment
Deployment evolved from rsync scripts to Capistrano for multi‑server orchestration, then to a Func‑based remote‑control framework that groups hosts and executes commands without SSH. Deployments are coordinated, status‑reported, and can safely restart services.
Outlook
The organization emphasizes tool standardization, automated testing, and agile processes similar to Scrum. Developers use VIM or TextMate, and continuous integration validates code before production.
Recruitment Process
Interviews focus on practical engineering skills rather than puzzles, aiming to find candidates with experience in reliability and scalability.
Experience and Lessons
Key takeaways include the importance of automation, the limits of MySQL sharding, the surprising performance of Redis, the benefits of Scala, and the need to retire uncertain projects early while building a strong, skilled team.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
