How Airbnb Rebuilt Its Next‑Generation Key‑Value Store
Airbnb completely rewrote its internal key‑value store Mussel, moving from a legacy V1 system to a cloud‑native, NewSQL‑backed V2 that reduces operational complexity, improves scalability, adds flexible consistency, and supports massive batch imports, all while achieving zero‑downtime migration of over 1 PB of data.
Why Rebuild
Airbnb's original key‑value store, Mussel V1, could not keep up with new requirements such as real‑time fraud detection, instant personalization, dynamic pricing, and massive data volumes. The platform needed to combine real‑time stream processing with bulk ingestion while remaining easy to manage.
Key Challenges of V1
Operational complexity: Scaling or replacing nodes required multi‑step Chef scripts on EC2; V2 uses Kubernetes manifests and automated rollouts, cutting manual effort from hours to minutes.
Capacity and hotspots: Static hash partitions sometimes overloaded nodes, causing latency spikes. V2’s dynamic range‑sharding and pre‑splitting keep p99 latency below 25 ms even on tables larger than 100 TB.
Consistency flexibility: V1 offered limited consistency controls. V2 lets teams choose between strong and eventual consistency per SLA.
Cost and transparency: V1’s resource usage was opaque. V2 adds namespace tenancy, quota control, and observable dashboards for cost visibility.
New Architecture
Mussel V2 is a full redesign that delivers a scalable, cloud‑native, predictable‑performance key‑value store while preserving functional equivalence for over 100 existing use cases.
Dispatcher
The stateless Dispatcher runs as a horizontally‑scalable Kubernetes service, replacing V1’s tightly‑coupled protocol‑specific design. It translates client API calls into backend reads or writes, supports dual‑write and shadow‑read modes during migration, handles retries and rate‑limiting, and integrates with Airbnb’s service mesh for security and service discovery.
Read path: Each dataname maps to a logical table, enabling optimized point, range, or prefix queries and stale reads from local replicas to reduce latency. Dynamic throttling and priority control maintain performance under traffic fluctuations.
Write path: Writes are first persisted to Kafka for durability, then replayed by Replayer and Write Dispatcher in order. This event‑driven model absorbs traffic bursts, guarantees consistency, and removes much of V1’s operational burden. Before CDC and snapshot capabilities matured, Kafka also underpinned upgrades, bootstraps, and migrations.
The architecture is ideal for derived‑data workloads and heavy replay scenarios, with a long‑term goal of moving ingestion and replication entirely to a distributed backend database.
Batch Import
Batch import remains the critical capability for moving large offline data sets into Mussel for low‑latency queries. V2 retains V1 semantics while adding two modes: merge (append to existing tables) and replace (replace an entire data set).
To keep the familiar onboarding flow, V2 preserves the Airflow‑based pipeline: data is transformed to a standard format and uploaded to S3 for import. Airflow, originally created at Airbnb, lets engineers define DAGs in code for rapid iteration and task orchestration.
A stateless controller schedules jobs, while a distributed, stateful worker cluster (Kubernetes StatefulSets) performs parallel imports, writing S3 records into tables. Optimizations such as deduplication for replace jobs, incremental merges, and insert‑on‑duplicate‑key‑ignore ensure high throughput at Airbnb scale.
TTL (Time‑to‑Live)
Automatic data expiration supports governance and storage efficiency. V1 relied on storage‑engine compaction cycles, which struggled at scale. V2 introduces a topology‑aware expiration service that partitions namespaces into range‑based subtasks processed concurrently by multiple workers. Parallel scanning and deletion shorten cleanup time for large data sets, while scheduling limits impact on online queries. Write‑heavy tables use max‑version enforcement and targeted deletions to maintain performance and data hygiene.
Migration Process
The migration goal was zero data loss and no impact on customer availability while moving all data and traffic from Mussel V1 to V2.
Blue‑Green Strategy
Blue Zone: All traffic initially goes to V1, providing a stable baseline while data is migrated in the background.
Shadowing (Green): After a table’s bootstrap, V2 begins shadow reads/writes alongside V1, but V1 still returns responses. This validates V2 correctness and performance without risk.
Reverse: Once confidence is built, V2 takes over live traffic while V1 remains on standby. Automatic circuit‑breakers and fallback logic switch traffic back to V1 if V2 error rates rise or lag increases.
Cutover: After all checks pass, traffic is switched per dataname to V2 permanently, with Kafka serving as a resilient write‑path intermediary.
Migration proceeds table‑by‑table, allowing fine‑grained rollbacks and risk‑based tuning.
Custom Migration Pipeline
Source data sampling: Download V1 backups, extract relevant tables, and sample to understand data distribution.
Pre‑splitting on V2: Create corresponding tables with pre‑defined shard layouts based on sampling to minimize data reshuffling.
Bootstrap: The most time‑consuming step, potentially taking hours or days per table. Kubernetes StatefulSets persist local state and periodically checkpoint.
Checksum verification: Ensure all data imported into V2 matches the V1 backup.
Catch‑up: Drain messages accumulated in Kafka during bootstrap.
Dual‑write: Both V1 and V2 consume the same Kafka topic, maintaining eventual consistency with replication lag typically under a few tens of milliseconds.
After dual‑write, read traffic is gradually shifted. The dispatcher can still route reads to V1 for certain tables while shadowing requests are sent to V2 for consistency checks. If V2 becomes unstable, traffic instantly falls back to V1, ensuring uninterrupted service.
Key Learnings
Consistency complexity: Moving from an eventually consistent backend (V1) to a strongly consistent one (V2) introduced write‑conflict challenges, requiring deduplication, hot‑key blocking, and delayed‑write fixes, often trading storage cost against read performance.
Pre‑splitting importance: Switching from hash‑based to range‑based partitioning required precise sampling and topology‑aware pre‑splits to avoid hotspots during migration.
Query model adjustments: V2’s range‑filter pushdown is less efficient than expected, necessitating client‑side pagination for prefix and range queries.
Freshness vs. cost: Different workloads balance data freshness, cost, and performance by reading from primary or secondary replicas.
Kafka’s role: Proven, stable p99‑millisecond latency makes Kafka indispensable for the migration pipeline.
Built‑in flexibility: Automatic retries, batch jobs, and per‑table stage control provide safety nets for large‑scale risk management.
Using blue‑green releases, dual‑write pipelines, and automated rollback, Airbnb migrated over 1 PB of data across thousands of tables with zero downtime and zero data loss, allowing product teams to continue delivering features while the storage engine evolved.
Conclusion and Next Steps
Mussel V2 uniquely combines capabilities that usually reside in separate systems: massive batch ingestion of tens of TB, sustained streaming writes exceeding 100 k ops/s, and p99 read latency under 25 ms. A simple namespace‑level switch enables stale reads when needed.
By coupling a NewSQL backend with Kubernetes‑native control planes, Mussel V2 delivers object‑storage elasticity, low‑latency caching, and service‑mesh operability in a single platform, freeing engineers from hand‑crafting caches, queues, and stores to meet SLAs.
Future posts will dive deeper into Mussel’s QoS management and large‑scale batch‑import optimizations, further exposing performance and reliability gains for complex data pipelines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Airbnb Technology Team
Official account of the Airbnb Technology Team, sharing Airbnb's tech innovations and real-world implementations, building a world where home is everywhere through technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
