How We Boosted Content Ingestion Performance 13× by Redesigning a Microservice System
This article details the complete redesign of a large‑scale content ingestion platform, explaining why the original microservice‑heavy architecture suffered from low development efficiency and poor performance, how a monolithic‑plus‑plugin approach solved these issues, and the resulting 13‑fold speedup, 10‑fold batch improvement, and 70% latency reduction.
Project Background
The QQ Browser search content ingestion system handled thousands of content types across many partners, but suffered from low development efficiency (adding a new data type required changes in 3‑4 services) and poor performance (high CPU usage, excessive JSON parsing, and string copying). Business teams complained about extreme throughput problems, such as processing six hundred million documents in twelve days.
Overall Design Goals
The redesign focused on five key areas:
Replace the fragmented microservice architecture with a single monolithic service to eliminate RPC overhead.
Introduce a plugin framework to make processing logic extensible and avoid hard‑coded if‑else branches.
Support both incremental updates and bulk "刷库" (batch) processing with separate data‑flow configurations.
Improve fault tolerance by adding Kafka‑based buffering and message replay.
Enable horizontal scaling by separating consumption threads from processing threads.
Detailed Design
1. From Microservices to a Monolith
The business workload is characterized by massive volume, lightweight computation, low failure tolerance, many content categories, and simple request patterns. The old system’s dozens of tiny services caused heavy RPC traffic; the new monolith keeps data in memory, dramatically reducing latency.
2. Plugin‑Based Processing Pipeline
The pipeline is split into three layers: ingestion, processing, and distribution. Ingestion supports DB pull, Kafka streams, HTTP/COS, and RPC sources, each with different data formats (JSON, protobuf, etc.). Plugins encapsulate each functional step, allowing flexible composition without code changes.
3. Incremental vs. Batch (刷库) Handling
Four processing flows are defined: data‑source update, feature update, data‑source batch, and feature batch. Incremental flows run full pipelines; batch flows skip unnecessary computation, achieving a ten‑fold increase in batch QPS (from ~1 000 to ~10 000 QPS).
4. Fault‑Tolerant Data Ingestion
All incoming data is first written to Kafka; only after successful processing is the offset committed. This guarantees no data loss even if a consumer crashes, and Kafka also provides traffic shaping.
5. Separate Consumption and Processing Threads
A lock‑free queue feeds a pool of worker threads. Each Kafka partition is consumed by a single thread, while multiple workers process the data in parallel, raising CPU utilization and allowing unlimited horizontal scaling.
New vs. Old System Diff Verification
Because the system has 15 distribution endpoints, a dedicated diff‑verification service collects logs from all endpoints, aggregates them, and runs a recursive JSON diff tool to ensure output consistency.
Coding Details and Optimizations
Adopted table‑driven programming to replace long if‑else chains.
Used C++20 std::atomic<std::shared_ptr<T>> instead of a double‑buffer design.
Replaced repetitive JSON parsing with RapidJSON Document objects stored in context.
Switched from RapidJSON to Sonic‑JSON, gaining ~40% faster parsing and 15% higher throughput.
Managed memory growth by lowering malloc_trim thresholds and linking jemalloc.
Refactored switch‑case logic into factories and plugin registries for better extensibility.
Development Process Enhancements
The team applied CI/CD best practices from the search platform, including requirement tracking in TAPD, developer credential checks, unified coding standards, mandatory code reviews, and strict pipeline quality gates. Documentation was expanded, and the pipeline was accelerated by reducing lock granularity and mirroring GitHub dependencies.
Business Impact
Performance Gains : Single‑core throughput rose from 13 QPS to 172 QPS (13×). Batch processing QPS increased ten‑fold. Average latency dropped from 2.7 s to 0.8 s (‑70%).
R&D Efficiency : Lead time for new features fell from 5.72 days to 1 day (‑82%). Total code lines shrank from 113 k to 28 k (‑75%) thanks to monolith consolidation, plugin design, and modern C++ syntax.
Overall, the redesign delivered a more reliable, faster, and easier‑to‑maintain content ingestion platform.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
