Backend Development 23 min read

How Pinterest Scaled to Billions of Page Views: Architecture & Sharding Secrets

Pinterest grew from zero to over 100 billion monthly page views by rapidly expanding its infrastructure, adopting simple, mature technologies like MySQL, Redis, and Memcache, and transitioning from clustering to sharding, offering practical lessons on scaling, tool selection, ID design, and migration strategies for massive growth.

21CTO

Jan 29, 2016

How Pinterest Scaled to Billions of Page Views: Architecture & Sharding Secrets

Introduction

Pinterest experienced exponential growth, doubling roughly every six weeks. In two years it went from 0 to 100 billion monthly page views, from two founders and one engineer to forty engineers, and from a single MySQL server to hundreds of web, API, MySQL, Redis, and Memcache instances.

Key Takeaways

Strong architecture handles growth by simply adding identical servers while preserving correctness.

When performance limits are reached, choose mature, simple, widely‑used tools (MySQL, Solr, Memcache, Redis) and avoid complex or immature technologies.

Basic Concepts

Pins are images linked to content.

Pinterest is a social network of users, boards, and pins.

The database stores boards, pins, users, and authentication data.

Timeline of Growth

Early 2010 (Self‑Discovery Phase)

Small team, Rackspace servers, a single web engine and MySQL instance.

January 2011

Amazon EC2 + S3 + CloudFront

1 Nginx, 4 web engines (redundancy)

1 MySQL master + 1 read replica

Task queue with two workers

MongoDB for counting

Two engineers

September 2011 (Rapid Growth)

Infrastructure exploded, adding many servers and diverse technologies, many of which later failed.

Amazon EC2 + S3 + CloudFront

2 Nginx, 16 web engines, 2 API engines

5 functional MySQL shards + 9 read replicas

4 Cassandra nodes

15 Membase nodes (3 clusters)

8 Memcache nodes

10 Redis nodes

3 task routers + 4 processors

4 Elasticsearch nodes

3 Mongo clusters

3 engineers

January 2012 (Mature Architecture)

Redesigned system:

Amazon EC2 + S3 + Akamai + ELB

90 web engines + 50 API engines

66 MySQL db.t2.large instances (each with a slave)

59 Redis instances

51 Memcache instances

1 Redis task manager + 25 processors

Sharded Solr

Later expanded to:

180 web engines + 240 API engines

88 MySQL cc2.8xlarge instances (each with a slave)

110 Redis instances

200 Memcache instances

4 Redis task managers + 80 processors

40 engineers

Why Amazon EC2/S3?

Reliability and rapid provisioning of new instances.

Strong support and peripheral services (load balancers, caching, etc.).

Ease of scaling without capacity planning for small teams.

Why MySQL?

Mature, durable, and widely known.

Linear request rate scaling.

Rich tooling (XtraBackup, Innotop, Maatkit) and community support.

Open‑source with strong vendor backing (Percona).

Why Memcache?

Simple hash‑table cache, highly mature.

Excellent performance, widely adopted, never crashes, free.

Why Redis?

Simple yet powerful data structures.

Supports persistence and replication with flexible options.

Good performance, widely liked, free.

Solr

Quick to install and use.

At the time, limited to a single node.

Considered Elasticsearch but faced scaling concerns.

Cluster vs. Sharding

Pinterest found that automatic clustering introduced many failure points and complexity, while sharding offered simpler, more reliable scaling.

Cluster Characteristics

Automated data distribution, rebalancing, and high availability.

Complex upgrade paths and numerous SPOF risks.

Sharding Advantages

Manual data placement reduces movement and failures.

Easy to add capacity by creating new shards.

Simple ID‑based routing.

Sharding Implementation

Adopted a 64‑bit ID: 16‑bit shard ID, 10‑bit type, remaining bits for auto‑incremented local ID.

Users are randomly assigned to shards; all of a user’s data resides in the same shard.

Started with 4 096 shards, scalable up to 65 536.

Lookup Process

Application uses a configuration mapping shard ranges to physical hosts, decomposes the 64‑bit ID, finds the shard, and queries the appropriate MySQL instance.

Object and Mapping Model

All data stored as objects (pins, boards, users) or mappings (user‑board, pin‑like).

Objects serialized as JSON blobs, later migrated to Thrift.

Mappings use naming convention noun_verb_noun (e.g., user_likes_pins).

Rendering a User Profile

Steps:

Extract username from URL and fetch user ID from a giant user table.

Decompose the ID to locate the correct shard.

Query the shard for user data, boards, and pins, using cache (Memcache/Redis) to minimize DB hits.

SELECT body FROM users WHERE id = <local_user_id>;
SELECT board_id FROM user_has_boards WHERE user_id=<user_id>;
SELECT body FROM boards WHERE id IN (<boards_ids>);
SELECT pin_id FROM board_has_pins WHERE board_id=<board_id>;
SELECT body FROM pins WHERE id IN (pin_ids);

Script Migration

During sharding transition, scripts acted as bridges between the old monolithic system and the new sharded architecture, moving billions of rows over several months.

Future Directions

Move toward a service‑oriented architecture to isolate functionality and reduce cross‑service connections.

Deploy dedicated services (e.g., follow service) to limit database and cache load.

Lessons Learned

Keep architecture simple to handle rapid growth.

Choose mature, well‑supported tools.

Design IDs that embed shard information to avoid data movement.

When scaling, distribute load evenly across shards.

Service‑oriented design improves stability, team organization, and security.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

architecture Sharding Databases Cloud scaling Pinterest

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.