Databases 19 min read

How Uber Scaled Docstore with CacheFront: An Integrated Caching Solution

This article details Uber's Docstore distributed database challenges and explains the design, architecture, and implementation of CacheFront—a transparent, high‑performance caching layer that reduces latency, improves scalability, and maintains strong consistency across microservices.

dbaplus Community
dbaplus Community
dbaplus Community
How Uber Scaled Docstore with CacheFront: An Integrated Caching Solution

Introduction

Docstore is Uber's internal distributed database built on MySQL, storing tens of petabytes and handling tens of millions of requests per second. Rapid growth in users, use‑cases, and microservice complexity created demand for lower latency, higher performance, and greater scalability.

Challenges

Disk‑based storage imposes a latency ceiling; scaling reads beyond existing limits is difficult.

Vertical scaling hits hardware limits; horizontal scaling introduces operational complexity and does not fully solve hot‑key or partition issues.

Read request rates far exceed write rates, stressing MySQL nodes.

Cost of scaling vertically and horizontally is high.

Docstore Architecture

Docstore consists of three layers: a stateless query engine, a stateful storage engine, and a control plane. The query engine handles planning, routing, sharding, authentication, and health monitoring. The storage engine provides consensus via Raft, replication, transactions, and concurrency control. Partitions are MySQL nodes backed by NVMe SSDs.

CacheFront Design Goals

Minimize vertical/horizontal scaling needs for low‑latency reads.

Reduce resource pressure on the storage engine by using cheaper cache hosts.

Improve P50/P99 latency and stabilize read‑latency spikes.

Replace custom per‑team cache solutions with a shared, transparent layer.

Leverage existing Docstore client APIs without extra handling.

Boost developer productivity and enable seamless cache‑technology upgrades.

Separate caching from Docstore's sharding to avoid hot‑key issues.

Allow independent horizontal scaling of the cache layer.

Transfer Redis ownership from feature teams to the Docstore team.

CacheFront Architecture

The query engine integrates a Redis interface for cache storage and implements cache‑invalidation mechanisms. High‑level architecture separates the cache from disk‑based storage, enabling independent scaling.

Cache Reads

Query engine receives a read request.

If caching is enabled, it attempts to fetch rows from Redis while streaming the response.

Remaining rows are retrieved from the storage engine.

Missing rows are asynchronously written back to Redis.

All rows are streamed to the client.

Cache Invalidation

Conditional Updates

Docstore supports conditional updates that may affect many rows, making it impossible for the stateless query engine to know which cache entries to invalidate before the write.

CDC‑Based Invalidation

Flux consumes MySQL binlog events from each storage‑engine cluster, publishing them to subscribers. A new consumer watches these events and invalidates or updates corresponding Redis entries, achieving near‑real‑time consistency.

Duplicate‑Write Removal

To avoid stale writes overwriting fresh data, CacheFront parses timestamps encoded in Redis values and uses a Lua script executed via Redis EVAL to atomically discard older writes.

Strong Consistency for Point Writes

For read‑your‑write scenarios, CacheFront provides an explicit API that lets services invalidate a row after the write completes, offering stronger guarantees than CDC alone.

Table Schema and Redis Codec

Docstore tables have a primary key (row key) and a partition key (prefix of the primary key). A custom Redis codec encodes MySQL rows into Redis‑compatible strings, allowing multiple databases to share the cache while preserving isolation.

Key Features

Real‑time consistency verification by comparing cache and DB reads.

Cache pre‑warming across regions to keep hot data available after failover.

Negative caching for missing rows to avoid repeated DB lookups.

Support for mapping a Docstore instance to multiple Redis clusters.

Sharding of Redis clusters by partition key to avoid hot‑partition failures.

Circuit breaker using a sliding‑window algorithm to short‑circuit failing Redis nodes.

Adaptive timeout that dynamically adjusts Redis request timeouts to meet P99 latency targets.

Results

Integrated cache reduced P75 latency by 75% and P99.9 latency by over 67% while flattening latency spikes.

Flux‑based invalidation and compare‑cache mode delivered strong consistency.

CacheFront is transparent to users, managed internally with header‑based options.

Sharding and pre‑warming provided scalability and fault tolerance; a major use‑case achieved 99% cache hit rate at >6 M RPS with successful failover.

CPU usage dropped from ~60 K cores to ~3 K Redis cores for the same throughput.

Today CacheFront serves more than 40 M requests per second across all production Docstore instances and continues to grow.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ScalabilityrediscachingDatabase ArchitectureConsistency
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.