Backend Development 16 min read

Scaling Cache Infrastructure at Pinterest

This article provides an in‑depth technical overview of how Pinterest scales its distributed cache layer using Memcached and Mcrouter on AWS, covering architecture, performance, high availability, load balancing, trade‑offs, and future directions.

Top Architect
Top Architect
Top Architect
Scaling Cache Infrastructure at Pinterest

As more users turn to Pinterest for inspiration, the demand for its core infrastructure has grown faster than ever. One of the key components is a distributed cache layer that sits in front of many microservices and databases, handling the majority of backend traffic.

Pinterest’s cache cluster runs on thousands of AWS EC2 instances, storing hundreds of terabytes of data and peaking at 150 million requests per second. By reducing latency across the stack and lowering the capacity needed for expensive backend storage, the cache provides significant performance and cost benefits.

This article dives into the architecture that powers Pinterest’s massive cache deployment.

1. Application Data Caching

Every Pinterest API request traverses a complex RPC graph that may involve dozens of services, such as image retrieval, recommendation, and content moderation. When the input can be expressed as a unique key, the result of that discrete query is cached for later reuse.

The cache is primarily used with look‑aside semantics, absorbing a large share of traffic that would otherwise hit compute‑intensive services and databases. Millisecond‑level tail latency and low per‑request cost make the cache an efficient, low‑cost scaling mechanism.

2. Cache Backbone: Memcached and Mcrouter

Memcached, a high‑performance C‑based in‑memory key‑value store, forms the storage layer, while Mcrouter acts as an application‑level proxy that provides high availability and sophisticated routing.

Memcached’s asynchronous, event‑driven architecture and multithreaded model enable easy horizontal scaling.

Extstore adds a secondary warm‑storage tier on NVMe flash, dramatically increasing storage efficiency.

The simple design allows building abstraction layers and horizontal scalability without awareness of other nodes.

Decades of testing and an active open‑source community ensure reliability and performance.

Native TLS termination and SPIFFE‑based authorization secure traffic.

Mcrouter, open‑sourced by Facebook in 2014, offers a single endpoint for developers, abstracts the entire cache cluster, and enforces consistent traffic behavior across all services.

It separates control and data planes, grouping servers into logical pools.

Its routing API supports region‑affinity, replication, multi‑layer caching, and shadow traffic.

As an ASCII‑protocol proxy, it adds features like TTL manipulation and on‑the‑fly compression.

Built‑in observability provides per‑client and per‑server latency, throughput, key‑prefix trends, and error rates.

The diagram shows how mcrouter routes requests to Memcached based on key prefixes.

3. Compute and Storage Efficiency

A single r5.2xlarge EC2 instance can handle over 100 k requests per second and tens of thousands of concurrent TCP connections with minimal added latency, making Memcached Pinterest’s highest‑throughput production service.

Extstore extends cache capacity beyond DRAM by attaching local NVMe SSDs, increasing per‑instance storage from ~55 GB to ~1.7 TB with modest cost increase, and balances I/O, compression, and tail latency.

4. High Availability

Using mcrouter’s routing features, the cache provides automatic failover for slow or failed servers, cross‑region replication for zero‑downtime loss of an availability zone, and shadow testing that injects latency or failures without affecting production traffic.

5. Load Balancing and Sharding

Consistent hashing maps each cache key to a specific host, enabling linear scaling of request volume with the number of instances. mcrouter’s hash pools support region‑affinity and multi‑layer routing, ensuring transparent scaling for clients.

The diagram illustrates how consistent hashing keeps most keys on the same server when nodes are added or removed.

6. Trade‑offs and Considerations

The proxy layer adds CPU and I/O overhead, but its high‑availability and routing benefits outweigh the performance cost.

Global configuration changes affect the entire fleet, increasing deployment risk but ensuring consistent topology.

Managing ~100 distinct Memcached clusters with varied tenancy, hardware, and routing policies adds operational burden but allows workload‑specific optimization.

Consistent hashing works well for balanced key distributions but does not eliminate hot‑key hotspots.

7. Outlook

Future work includes embedding Memcached directly into host processes to eliminate network overhead for latency‑critical paths and designing robust multi‑region redundancy solutions.

Original article: https://medium.com/pinterest-engineering/scaling-cache-infrastructure-at-pinterest-422d6d294ece

backenddistributed systemsscalabilitycachingAWSMemcachedmcrouter
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.