How Facebook Migrated Messenger Storage to MyRocks for Massive Cost and Latency Gains
Facebook upgraded Messenger’s storage by redesigning the schema, switching from HBase to the MyRocks MySQL engine, and moving to flash storage, achieving seamless migration without downtime, cutting storage use by 90%, reducing latency fiftyfold, and simplifying operations for billions of users.
1. Background
Facebook Messenger serves over a billion users, allowing instant sharing of text, images, and video. The system evolved from a monolithic service to a architecture with dedicated cache, an Iris queueing layer for writes, and a storage service for historical messages.
To improve user experience, Facebook fundamentally optimized the underlying storage, including:
Redesigning and simplifying the data schema.
Replacing HBase with MyRocks (Facebook’s open‑source MySQL storage engine).
Switching from spinning disks to the latest flash storage.
The result was better user experience, increased system flexibility, reduced latency, and lower storage consumption, all achieved with a seamless, zero‑downtime upgrade.
2. Challenges of Large‑Scale Migration
MyRocks offers many advantages, such as:
Leveraging Facebook’s open‑source compute projects.
Utilizing flash storage.
Applying Facebook’s mature MySQL operations expertise.
Reducing the number of physical data nodes while improving availability.
The migration was necessary, but the massive data volume in HBase (petabytes) could not be disrupted during the process.
Reading from the HBase cluster added extra load; an aggressive migration could degrade HBase performance, cause errors, and hurt user experience.
The data volume was PB‑scale, and schema changes required careful analysis of existing data, handling legacy data, and resolving conflicts to ensure users saw unchanged data.
Thus, achieving a seamless migration for a billion users was a compelling challenge.
3. Migration Plan
Facebook designed two migration flows: a normal flow handling 99.9% of accounts and a special flow for exceptional accounts.
The process includes strict data validation, pre‑planned rollback strategies, and thorough checks to ensure no account is missed before decommissioning the old system.
3.1 Normal Migration Flow
Data before and after migration must be strongly consistent. The migration assumes no writes occur for the account during the transfer.
To guarantee this, a state mechanism and monitoring tools record the account’s last data position in the old system, migrate the data, then verify that the old position has not changed. If unchanged, the account becomes active in the new system; otherwise, the migration fails, the new data is cleared, and the process retries.
During the dual‑write phase, the migrator performs two validations:
(1) Data validation – confirming HBase and MyRocks data match.
(2) API validation – reading from both systems simultaneously and comparing results.
If data validation fails, a rollback occurs: reads continue from the old system and new‑system data is cleared.
3.2 Cache‑Based Migration Flow
Special accounts (e.g., large enterprise chatbot accounts) cannot use the normal flow, so a cache mechanism is employed.
A snapshot of the account data is taken at a point in time and stored in a cache, then migrated to MyRocks.
During migration, new writes are queued by Iris (which can retain writes for weeks). After the cached data is migrated, the new system consumes the queued writes, catching up with the old system, after which the process follows the normal flow.
4. Benefits After Migration
Storage Space
The simplified schema dramatically reduced storage usage, and MyRocks’s architecture cut replication resources by half, resulting in an overall 90% reduction in storage consumption.
Latency
MyRocks optimizations and flash storage lowered data latency by 50×, making actions like retrieving old messages noticeably faster for users.
Maintenance Cost
Compared to HBase, MyRocks is more mature within Facebook, offering smarter disaster‑recovery mechanisms that eliminate manual interventions.
Product Support
The new architecture and performance make it easier to add features such as mobile message content search, which was difficult with HBase.
Content translated and compiled from the official Facebook article.
https://code.fb.com/data-infrastructure/migrating-Messenger-storage-to-optimize-performance
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
