How Tumblr Scaled to 5 Billion Page Views: Inside Their Distributed Architecture

This article examines how Tumblr handled rapid growth—processing 5 billion daily page views, 40 k requests per second, and terabytes of data—by evolving from a LAMP stack to a Scala‑based, Finagle‑driven distributed system with HBase, Redis, Kafka, and a cell architecture that supports massive real‑time dashboards.

21CTO
21CTO
21CTO
How Tumblr Scaled to 5 Billion Page Views: Inside Their Distributed Architecture

Key Data

• 5 billion page views per day • Over 150 billion page views per month • Peak traffic of nearly 40 k requests per second • 1 TB of new data ingested daily into Hadoop • Over 1 TB of MySQL/HBase/Redis/memcache data generated daily • 30% month‑over‑month growth • Approximately 1 000 production servers • About 20 engineers supporting the platform

Software Environment

Development on OS X, production on Linux (CentOS/Scientific). Stack includes Apache, PHP, Scala, Ruby, MySQL, HBase, Redis, Varnish, HAProxy, nginx, memcache, Gearman, Kafka, Kestrel, Finagle, Thrift, HTTP, Func (remote‑control framework), Git, Capistrano, Puppet, Jenkins.

Hardware Environment

500 web servers, 200 database servers (47 pools, 20 shards), 30 memcache servers, 22 Redis servers, 15 Varnish servers, 25 HAProxy nodes, 8 nginx servers, 14 work‑queue servers (Kestrel + Gearman).

Architecture

Tumblr’s usage pattern differs from typical social sites: tens of millions of posts daily, each with hundreds of comments, and users typically have only a few hundred followers. The Dashboard receives the majority of traffic, requiring real‑time, consistent updates.

Old Architecture

Initially a classic LAMP deployment on Rackspace with a three‑server setup (web, database, PHP). Scaling introduced memcache, front‑end caching, HAProxy, and MySQL sharding. Custom C services were built for ID generation and Dashboard notifications (Staircar).

New Architecture

The system migrated to a JVM‑centric model, replacing PHP with services written in Scala and using Finagle for RPC, service discovery, and tracing. The stack now includes Scala, Finagle, HBase, Redis, Kafka, and Thrift. The migration enables better scalability and leverages the extensive Java ecosystem.

Internal Firehose

A high‑throughput messaging pipeline (firehose) transports user actions (posts, likes, etc.) using LinkedIn’s Kafka for storage and Thrift/HTTP for communication. The firehose retains a week of data and supports multiple client IDs without duplication.

Cell Architecture for Dashboard Inbox

Dashboard data is stored in independent “cells.” Each cell contains its own HBase cluster, service cluster, and Redis cache, handling a subset of users. Cells process firehose events, write posts to databases, and update inboxes. This design provides parallelism, fault isolation, and easy scaling.

Launching in New York

New York offers abundant funding and talent, but hiring remains challenging for a fast‑growing startup.

Team Structure

Teams include Infrastructure, Platform, SRE, Product, Web Ops, and Services. Responsibilities span IP/DNS, hardware provisioning, core application development, SQL sharding, reliability engineering, and strategic service development.

Software Deployment

Deployment evolved from rsync scripts to Capistrano for multi‑server orchestration, then to a Func‑based remote‑control framework that groups hosts and executes commands without SSH. Deployments are coordinated, status‑reported, and can safely restart services.

Outlook

The organization emphasizes tool standardization, automated testing, and agile processes similar to Scrum. Developers use VIM or TextMate, and continuous integration validates code before production.

Recruitment Process

Interviews focus on practical engineering skills rather than puzzles, aiming to find candidates with experience in reliability and scalability.

Experience and Lessons

Key takeaways include the importance of automation, the limits of MySQL sharding, the surprising performance of Redis, the benefits of Scala, and the need to retire uncertain projects early while building a strong, skilled team.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsBackend ArchitectureScalabilityredisScalaFinagleTumblr
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.