How Meipai Scaled to 100M Users in 9 Months: Key Lessons from QCon
An in‑depth recap of Hong Xiaojun’s QCon presentation reveals how Meipai’s architecture evolved from a minimalist design to a highly scalable, high‑availability system, tackling MySQL bottlenecks, cache challenges, monitoring, CDN resilience, and technology migrations to support rapid growth to over a hundred million users.
Last night I watched Hong Xiaojun, the architect of Meipai, share his talk “Achieving a Scalable Architecture that Reached Over 100 Million Users in Nine Months” at QCon. I attended the conference and recommend watching the full video rather than just the slides, especially for fast‑growing startups.
The central idea is to choose the solution that best fits the current stage, learning from many painful lessons.
Meipai grew from launch to 100 M users in just a few months, a rare case in the industry.
Architecture evolution stages:
Ultra‑simplified design for rapid release.
Maintain simplicity for fast product iteration.
Scalable and highly available guarantees as user volume grows.
High scalability and high availability at massive scale.
Key problems encountered:
MySQL slow queries.
MySQL write bottlenecks.
Redis timeouts.
Very low memcached hit rate.
Service inter‑dependencies.
Unstable monitoring alerts.
Various CDN failures.
High cost of adding new fields.
MySQL slowdown persists as scale increases.
MySQL
Initially a single instance handling all logic, including feed joins. When slow queries appeared, master‑slave replication and read‑write separation with multiple slaves were added. Later, write performance degraded; hardware upgrades were used instead of architectural changes due to the need for rapid development.
Eventually sharding was introduced, but write latency and high cost of schema changes remained. Two solutions were applied:
Asynchronous writes – front‑end always writes, while heavy work is queued for background processing.
Index‑data separation – indexed fields moved to a separate table, other data stored as key‑value blobs (protobuf), reducing schema‑change cost.
Cache
Both memcached and Redis were used. Early on Redis timeouts led to a multi‑slave expansion and scheduling dumps during low‑traffic periods or on dedicated machines. Memcached suffered low hit rates and slab “calcification” (memory fragmentation). The team isolated core services to avoid this issue.
When high availability became critical, master‑slave cache setups were refined so that masters also served reads, preventing single‑point failures.
Operations
Initial monitoring was simple and sometimes missed alerts. Over time, a more robust monitoring system on high‑performance servers was built, adding richer metrics and logs to pinpoint issues. Switches were added for third‑party services to keep core paths available during outages.
Third‑Party Services
CDN problems, such as DNS attacks and hijacking, were mitigated by collaborating with multiple providers, replicating data across clouds, and implementing client‑side fallback and server‑side availability probing.
Technology Stack
The architecture remained stable, but new technologies were introduced as needed: MongoDB for early experiments, Java for business logic during scalability phases, and C for low‑level services.
This talk provides a practical roadmap of how a system can evolve from zero to hundreds of millions of users, highlighting real‑world pitfalls and solutions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
