Databases 15 min read

How Pinterest Scaled with MySQL Sharding: A Deep Dive into Their Storage Architecture

Pinterest rebuilt its storage layer using a MySQL‑centric sharding design that meets strict stability, scalability, and performance requirements, detailing business needs, shard key generation, object and mapping tables, expansion strategies, and practical lessons learned over three years of production use.

dbaplus Community
dbaplus Community
dbaplus Community
How Pinterest Scaled with MySQL Sharding: A Deep Dive into Their Storage Architecture

Business Requirements

The system must be highly stable, easy to operate, and horizontally extensible.

Pins generated by users must be permanently accessible.

Requests for N pins must be returned in a deterministic order (e.g., reverse creation time or custom sorting).

Updates should be fast, while achieving eventual consistency may require additional mechanisms such as distributed transaction logs.

Design Principles and Notes

Because data spans multiple databases, joins and foreign keys cannot be used across shards; only sub‑queries within a single shard are allowed. Load balancing is essential, and moving data piece‑by‑piece is avoided; instead, whole virtual nodes are migrated to physical nodes when necessary.

A simple, reliable solution was needed for rapid prototyping on a distributed data platform where each node remains highly stable.

All data is replicated to slave nodes for high availability and periodically exported to S3 via MapReduce. In production, only the master node is written to; slaves are read‑only and lag behind, making them unsuitable for direct reads.

A globally unique 64‑bit ID (UUID) is generated for every object, encoding shard ID, type ID, and a local auto‑increment ID.

How We Partition Data

We chose mature MySQL as the foundation, avoiding newer NoSQL solutions (MongoDB, Cassandra, Membase) that were deemed insufficiently stable at the time.

Initially we deployed eight EC2 instances, each running a MySQL master‑slave pair. Every MySQL instance hosts multiple databases named db00000, db00001, …, each representing a shard. A configuration table stored the mapping of shards to physical machines, persisted in ZooKeeper.

When a shard needs to move or a machine fails, the configuration is updated and propagated to all MySQL servers.

Object Tables

Object tables store pins, users, boards, comments, etc. Each row contains a local auto‑increment ID and a blob column holding the entire object as JSON.

Creating a new pin involves assembling the JSON blob, determining a shard ID (often the same as the board’s shard), inserting the blob into the object table, retrieving the local ID, and finally constructing the 64‑bit global ID.

Updates are performed in a single MySQL transaction (read‑modify‑write). Deletions can be hard deletes or soft deletes by adding an active flag set to false and filtering on the client side.

Mapping Tables

Mapping tables capture relationships such as board‑to‑pin. Each mapping row contains three columns: from (source ID), to (target ID), and sequence (used for ordering). A composite index on ( from, to, sequence) is created, and rows are sharded by the from object's shard ID.

For reverse lookups, separate tables (e.g., pin_owned_by_board) are created. The sequence field often stores a Unix timestamp to provide monotonic ordering.

Typical query pattern: retrieve up to 50 pin_id s from a board’s mapping table, then fetch the corresponding pin objects.

Scaling Strategies

Three primary scaling methods are used:

Upgrade existing machines (more CPU, RAM, faster disks).

Add new shard ranges: initially only the first 4 K shards (0‑4095) were populated; later additional MySQL servers were introduced to host shards 4096‑8191.

Move existing shards to new machines: for example, split shards 0‑511 from an old server into two new master‑slave pairs, each handling half the range.

Advantages

The UUID generation scheme allows easy creation of globally unique identifiers without needing MySQL ALTER operations. Adding new JSON fields or new mapping tables requires only schema‑level changes in the application code, not costly database migrations.

Mod Shard (Hash‑Based Sharding)

For data that cannot be accessed by ID (e.g., mapping a Facebook ID to a Pinterest ID), a separate hash‑based shard system is used. The shard is computed as shard = md5("1.2.3.4") % 4096, and a configuration file maps each hash bucket to a physical shard.

Final Thoughts

The system has been in production at Pinterest for over three and a half years and continues to serve massive traffic. While it does not guarantee full ACID properties, it provides sufficient reliability and performance for Pinterest’s workloads. Failure recovery relies on ZooKeeper‑stored shard configurations and manual promotion of slaves to masters when a master fails.

Overall, the design demonstrates that a carefully engineered MySQL sharding architecture can meet the demanding scalability and stability needs of a fast‑growing internet service.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

shardingmysql
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.