How We Revamped QQ Browser Content Architecture: From Microservices to a High‑Performance Monolith
Facing low development efficiency, poor CPU utilization, and fragile fault tolerance, the QQ Browser content ingestion team rebuilt a 93‑service microservice system into a single‑process, plugin‑driven architecture, achieving up to 13‑fold throughput gains, 10‑fold batch‑processing speedups, and dramatically reduced lead times and code complexity.
Background and Pain Points
The original QQ Browser search content ingestion platform consisted of 93 tiny micro‑services. Adding a new content type required changes in 3‑4 services, leading to low CPU utilization (≤40%), high latency, and frequent data‑loss risk. Business complaints included transferring 6 billion documents in 12 days and long onboarding cycles.
Key Design Goals
Merge the micro‑services into a single monolithic process to eliminate RPC overhead.
Introduce a plugin framework so that each processing step is a reusable component.
Support both incremental updates and large‑scale batch ("刷库") jobs.
Provide robust fault‑tolerance by buffering messages in Kafka and using peak‑shaving.
Separate consumption threads from processing threads to raise CPU utilization and enable horizontal scaling.
Architecture Redesign
The new system is organized into three logical layers: Ingestion , Processing , and Distribution . Each layer consists of interchangeable plugins that are composed via configuration. For example, a Kafka ingestion plugin parses JSON from the novel service, while a PB plugin handles binary streams from the mini‑program service.
Batch jobs are treated as a distinct processing flow, allowing custom configurations that skip unnecessary computation. This reduces per‑record work from dozens of JSON serializations to a single pass.
Fault Tolerance and Data Safety
All HTTP/trpc pushes are first written to a Kafka topic with dedicated partitions. Consumers commit offsets only after successful processing, ensuring that node crashes do not cause data loss. Batch jobs are idempotent and can resume from the last successful checkpoint.
Performance Optimizations
Implemented a lock‑free work queue: each Kafka partition is consumed by a single thread, while multiple worker threads process messages, raising CPU utilization from ≤40% to near 100%.
Replaced repetitive if‑else routing with a data‑driven plugin dispatcher, reducing code size and improving maintainability.
Switched from RapidJSON to Sonic‑JSON, gaining a 40% speedup and a 15% throughput increase in benchmarks.
Adopted std::atomic<std::shared_ptr<T>> (C++20) to replace a double‑buffer design, simplifying dynamic data loading.
Integrated jemalloc to address memory‑pool fragmentation that caused OOM after several load cycles.
Quantitative Results
Metric Before After Improvement
-------------------------------------------------------
Avg single‑core QPS 13 172 +13×
Avg single‑core batch QPS 13 230 +17×
Cluster batch QPS 500‑1000 10000 +10× (storage‑limited)
Avg processing latency 2.7 s 0.8 s –71%
P99 latency 17 s 1.9 s –88%
P999 latency 19 s 3.7 s –80%
CPU utilization ≤40% ≈100% +2.5×For the video business, peak throughput rose from 33 465/min on 40 cores (≈13 QPS per core) to 32 119/min on just 6 cores (≈90 QPS per core). Scaling the processing thread count further pushed the peak to 162 QPS per core when CPU reached 100%.
Code Quality and Development Efficiency
Metric Before After Improvement
-------------------------------------------------------
Business‑request P80 lead time 5.72 days ≤1 day –82%
CodeCC issues 568 0 –100%
Unit‑test coverage 0 % 77 % +77 pp
Avg cyclomatic complexity 24 2.31 –90%
Total lines of code 113 k 28 k –75%
Critical service count 15 3 –80%The reductions stem from merging micro‑services into a monolith, adopting a clean plugin architecture, and leveraging modern C++ features (auto, range‑for, emplace). The resulting codebase is smaller, easier to understand, and more extensible.
CI/CD and Operational Practices
Requirement confirmation via TAPD with mandatory fields.
Developer credential verification before committing to production.
Unified coding and Doxygen standards.
Structured code review workflow with learning cases.
Standardized third‑party library usage to avoid dependency chaos.
Pipeline lock‑granularity reduction, allowing parallel stages and cutting MR pipeline time by >25%.
GitHub mirror usage to accelerate external dependency fetching.
Conclusion
By rethinking the architecture from a fragmented micro‑service mesh to a single, plugin‑driven monolith, the QQ Browser content platform achieved massive performance gains, stronger fault tolerance, and a dramatically shorter development cycle, while also reducing code size and technical debt.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
