How We Boosted Content Ingestion Performance 13× by Redesigning a Microservice System

This article details the complete redesign of a large‑scale content ingestion platform, explaining why the original microservice‑heavy architecture suffered from low development efficiency and poor performance, how a monolithic‑plus‑plugin approach solved these issues, and the resulting 13‑fold speedup, 10‑fold batch improvement, and 70% latency reduction.

dbaplus Community
dbaplus Community
dbaplus Community
How We Boosted Content Ingestion Performance 13× by Redesigning a Microservice System

Project Background

The QQ Browser search content ingestion system handled thousands of content types across many partners, but suffered from low development efficiency (adding a new data type required changes in 3‑4 services) and poor performance (high CPU usage, excessive JSON parsing, and string copying). Business teams complained about extreme throughput problems, such as processing six hundred million documents in twelve days.

Overall Design Goals

The redesign focused on five key areas:

Replace the fragmented microservice architecture with a single monolithic service to eliminate RPC overhead.

Introduce a plugin framework to make processing logic extensible and avoid hard‑coded if‑else branches.

Support both incremental updates and bulk "刷库" (batch) processing with separate data‑flow configurations.

Improve fault tolerance by adding Kafka‑based buffering and message replay.

Enable horizontal scaling by separating consumption threads from processing threads.

Detailed Design

1. From Microservices to a Monolith

The business workload is characterized by massive volume, lightweight computation, low failure tolerance, many content categories, and simple request patterns. The old system’s dozens of tiny services caused heavy RPC traffic; the new monolith keeps data in memory, dramatically reducing latency.

2. Plugin‑Based Processing Pipeline

The pipeline is split into three layers: ingestion, processing, and distribution. Ingestion supports DB pull, Kafka streams, HTTP/COS, and RPC sources, each with different data formats (JSON, protobuf, etc.). Plugins encapsulate each functional step, allowing flexible composition without code changes.

3. Incremental vs. Batch (刷库) Handling

Four processing flows are defined: data‑source update, feature update, data‑source batch, and feature batch. Incremental flows run full pipelines; batch flows skip unnecessary computation, achieving a ten‑fold increase in batch QPS (from ~1 000 to ~10 000 QPS).

4. Fault‑Tolerant Data Ingestion

All incoming data is first written to Kafka; only after successful processing is the offset committed. This guarantees no data loss even if a consumer crashes, and Kafka also provides traffic shaping.

5. Separate Consumption and Processing Threads

A lock‑free queue feeds a pool of worker threads. Each Kafka partition is consumed by a single thread, while multiple workers process the data in parallel, raising CPU utilization and allowing unlimited horizontal scaling.

New vs. Old System Diff Verification

Because the system has 15 distribution endpoints, a dedicated diff‑verification service collects logs from all endpoints, aggregates them, and runs a recursive JSON diff tool to ensure output consistency.

Coding Details and Optimizations

Adopted table‑driven programming to replace long if‑else chains.

Used C++20 std::atomic<std::shared_ptr<T>> instead of a double‑buffer design.

Replaced repetitive JSON parsing with RapidJSON Document objects stored in context.

Switched from RapidJSON to Sonic‑JSON, gaining ~40% faster parsing and 15% higher throughput.

Managed memory growth by lowering malloc_trim thresholds and linking jemalloc.

Refactored switch‑case logic into factories and plugin registries for better extensibility.

Development Process Enhancements

The team applied CI/CD best practices from the search platform, including requirement tracking in TAPD, developer credential checks, unified coding standards, mandatory code reviews, and strict pipeline quality gates. Documentation was expanded, and the pipeline was accelerated by reducing lock granularity and mirroring GitHub dependencies.

Business Impact

Performance Gains : Single‑core throughput rose from 13 QPS to 172 QPS (13×). Batch processing QPS increased ten‑fold. Average latency dropped from 2.7 s to 0.8 s (‑70%).

R&D Efficiency : Lead time for new features fell from 5.72 days to 1 day (‑82%). Total code lines shrank from 113 k to 28 k (‑75%) thanks to monolith consolidation, plugin design, and modern C++ syntax.

Overall, the redesign delivered a more reliable, faster, and easier‑to‑maintain content ingestion platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendMicroservicesplugin architectureCsystem redesign
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.