How Quora Optimizes MySQL Reads, Writes, and Storage with Caching and MyRocks
This article explains how Quora reduces database load by optimizing read patterns, improving high‑QPS queries with smarter caching, shrinking table size using MyRocks, and boosting write throughput through parallel replication settings, providing concrete steps, code snippets, and real‑world results.
0 Database Load Main Components
Read
Data volume
Write
1 Optimize Reads
1.1 Different Read Types Need Different Optimizations
Complex queries (joins, aggregations)
When counting becomes a bottleneck, pre‑compute the counts in a separate table so that the application can read the count directly instead of recalculating it on each request.
Large scans
Apply LIMIT or pagination to restrict the number of rows scanned.
Schema‑query mismatches
Typical problems include missing indexes, indexes that do not cover enough columns, or column order in the index that is sub‑optimal for the query. Common fixes are:
Remove unnecessary columns from the SELECT list, especially those not covered by an index.
Eliminate ORDER BY clauses and perform sorting on the client side (MySQL CPU is usually more valuable than client CPU).
Delete queries that no longer provide needed functionality.
1.2 High‑QPS Queries
Even with optimized SQL and a good schema, queries that receive very high QPS can overload the database, often indicating ineffective caching.
Inefficient cache keys
Overly specific or narrow keys cause many cache misses, leading to high QPS.
Overly broad keys pull large data sets from the DB on each request.
User‑language table
Original cache key (uid, language_id) was too granular. Switching to a key based only on uid reduced QPS by over 90% while only slightly increasing the amount of data fetched per query.
A2A (ask‑to‑answer) table
The table links users, questions, and answerers, resulting in very high QPS. Two cache keys were used: (question_id, user_id) – users who have already requested answers. (question_id, answerer_id) – users who have requested the same answerer.
Because the key space is sparse (product of question count and user count), a supplemental cache keyed only by question_id was added, limiting entries to N per question. This cache reduced A2A QPS by 50‑66%.
1.3 Sparse One‑Dimensional Datasets
When caching individual question_id values, most entries are “no redirect,” wasting space. Quora switched to caching ranges (e.g., 123–127) as a single “no redirect” entry, cutting related QPS by about 90%.
2 Optimize Table Space
Table size impacts storage cost, I/O pressure, backup time, and overall performance. Quora migrated older shards to MyRocks (a RocksDB‑based storage engine) using a binlog‑replay tool and shadow‑read validation.
Create an empty table on the MyRocks master with the same schema.
Copy data and replay the binlog from the MySQL master to the MyRocks master.
Run shadow‑read tests to ensure identical results.
Switch traffic to the MyRocks shard.
Space savings: >80% for one shard, 50‑60% for another.
3 Optimize Writes
MySQL replication replays writes serially on replicas, limiting write scalability on multi‑core machines. Parallel replication can be enabled by configuring slave_parallel_workers and one of the following types: slave_parallel_type=LOGICAL_CLOCK (MySQL 8.0.26+; formerly replica_parallel_type) – works even when all tables reside in the same logical database. slave_parallel_type=DATABASE – requires tables to be spread across multiple logical databases. Tables can be moved with:
ALTER TABLE logical_db1.table RENAME logical_db2.mytable;The logical‑database mapping is stored in ZooKeeper, allowing dynamic re‑assignment without code changes.
4 Conclusion
Quora combines read, write, and storage optimizations—caching strategies, MyRocks migration, and parallel replication—to dramatically reduce database load. Similar techniques can be applied to improve performance in other systems.
Reference: https://www.percona.com/blog/scaling-mysql-a-good-problem-to-have/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JavaEdge
First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
