Backend Development 22 min read

How We Revamped QQ Browser Content Architecture: From Microservices to a High‑Performance Monolith

Facing low development efficiency, poor CPU utilization, and fragile fault tolerance, the QQ Browser content ingestion team rebuilt a 93‑service microservice system into a single‑process, plugin‑driven architecture, achieving up to 13‑fold throughput gains, 10‑fold batch‑processing speedups, and dramatically reduced lead times and code complexity.

Architect

Feb 4, 2024

Background and Pain Points

The original QQ Browser search content ingestion platform consisted of 93 tiny micro‑services. Adding a new content type required changes in 3‑4 services, leading to low CPU utilization (≤40%), high latency, and frequent data‑loss risk. Business complaints included transferring 6 billion documents in 12 days and long onboarding cycles.

Key Design Goals

Merge the micro‑services into a single monolithic process to eliminate RPC overhead.

Introduce a plugin framework so that each processing step is a reusable component.

Support both incremental updates and large‑scale batch ("刷库") jobs.

Provide robust fault‑tolerance by buffering messages in Kafka and using peak‑shaving.

Separate consumption threads from processing threads to raise CPU utilization and enable horizontal scaling.

Architecture Redesign

The new system is organized into three logical layers: Ingestion , Processing , and Distribution . Each layer consists of interchangeable plugins that are composed via configuration. For example, a Kafka ingestion plugin parses JSON from the novel service, while a PB plugin handles binary streams from the mini‑program service.

Batch jobs are treated as a distinct processing flow, allowing custom configurations that skip unnecessary computation. This reduces per‑record work from dozens of JSON serializations to a single pass.

Fault Tolerance and Data Safety

All HTTP/trpc pushes are first written to a Kafka topic with dedicated partitions. Consumers commit offsets only after successful processing, ensuring that node crashes do not cause data loss. Batch jobs are idempotent and can resume from the last successful checkpoint.

Performance Optimizations

Implemented a lock‑free work queue: each Kafka partition is consumed by a single thread, while multiple worker threads process messages, raising CPU utilization from ≤40% to near 100%.

Replaced repetitive if‑else routing with a data‑driven plugin dispatcher, reducing code size and improving maintainability.

Switched from RapidJSON to Sonic‑JSON, gaining a 40% speedup and a 15% throughput increase in benchmarks.

Adopted std::atomic<std::shared_ptr<T>> (C++20) to replace a double‑buffer design, simplifying dynamic data loading.

Integrated jemalloc to address memory‑pool fragmentation that caused OOM after several load cycles.

Quantitative Results

Metric                     Before   After   Improvement
-------------------------------------------------------
Avg single‑core QPS        13       172     +13×
Avg single‑core batch QPS  13       230     +17×
Cluster batch QPS          500‑1000 10000   +10× (storage‑limited)
Avg processing latency     2.7 s    0.8 s   –71%
P99 latency                17 s     1.9 s   –88%
P999 latency               19 s     3.7 s   –80%
CPU utilization            ≤40%    ≈100%   +2.5×

For the video business, peak throughput rose from 33 465/min on 40 cores (≈13 QPS per core) to 32 119/min on just 6 cores (≈90 QPS per core). Scaling the processing thread count further pushed the peak to 162 QPS per core when CPU reached 100%.

Code Quality and Development Efficiency

Metric                     Before   After   Improvement
-------------------------------------------------------
Business‑request P80 lead time 5.72 days ≤1 day –82%
CodeCC issues               568      0       –100%
Unit‑test coverage          0 %      77 %    +77 pp
Avg cyclomatic complexity   24       2.31    –90%
Total lines of code         113 k    28 k    –75%
Critical service count      15       3       –80%

The reductions stem from merging micro‑services into a monolith, adopting a clean plugin architecture, and leveraging modern C++ features (auto, range‑for, emplace). The resulting codebase is smaller, easier to understand, and more extensible.

CI/CD and Operational Practices

Requirement confirmation via TAPD with mandatory fields.

Developer credential verification before committing to production.

Unified coding and Doxygen standards.

Structured code review workflow with learning cases.

Standardized third‑party library usage to avoid dependency chaos.

Pipeline lock‑granularity reduction, allowing parallel stages and cutting MR pipeline time by >25%.

GitHub mirror usage to accelerate external dependency fetching.

Conclusion

By rethinking the architecture from a fragmented micro‑service mesh to a single, plugin‑driven monolith, the QQ Browser content platform achieved massive performance gains, stronger fault tolerance, and a dramatically shorter development cycle, while also reducing code size and technical debt.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Performance Optimization CI/CD Microservices System Design plugin architecture C#monolith

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.