Design and Implementation of a Configurable, Extensible Content Processing System (Apollo)

Apollo is a configurable, extensible content‑processing platform that models each step as a node defined in a configuration file, supports multiple implementations for A/B testing, decouples producers and consumers via Kafka, ensures fault‑tolerant retries and replay, captures fine‑grained metrics through Canal‑to‑TiDB pipelines, and cuts new‑type development effort to roughly ten percent of the original cost while delivering high‑quality data to downstream teams.

NetEase Media Technology Team
NetEase Media Technology Team
NetEase Media Technology Team
Design and Implementation of a Configurable, Extensible Content Processing System (Apollo)

Early media content production relied on an editorial team that could fetch data centrally from databases. As the content strategy evolved, new sources such as short news, micro‑videos, and open courses were added, dramatically increasing the volume and dispersion of data. This created a need for a unified system that could ingest multiple content sources, apply configurable processing steps, and remain highly extensible.

The proposed solution, named Apollo, introduces a generic framework that abstracts each processing step as a node. A node may be a simple HTTP call or a complex interaction with an external system. The workflow is defined by a configuration file, allowing arbitrary ordering of nodes, insertion of pre‑ or post‑interceptors, and hot‑deployment without service restarts.

To support A/B testing and gradual feature rollout, the framework supports multiple implementations per node. Two strategies are provided: (1) conditional branching where a single data item passes only one implementation, and (2) sequential execution of all implementations, with the subscriber choosing the final result.

Given the bursty nature of content production, a message system was introduced to decouple the processing pipeline from upstream producers and downstream consumers. After evaluating RabbitMQ and Kafka, Kafka was selected for its partitioned topics, support for multiple consumer groups, and message replay capabilities.

Fault tolerance is achieved through automatic state tracking and retry mechanisms. When a node fails, the system records the current step and status, then retries via scheduled tasks until success. A tool also allows re‑processing historic data (e.g., after a tag‑service algorithm change) by replaying specific pipeline stages.

For fine‑grained statistics, the system captures detailed metrics (start/end times, status, intermediate results) and stores them centrally. Instead of logging at every database write, Canal monitors MySQL binlogs, extracts row‑level changes, and publishes them to Kafka. A temporary table aggregates these changes, which are later consolidated into TiDB—chosen for its high‑concurrency real‑time write, query, and analytical capabilities.

With comprehensive metrics available, custom monitoring rules can trigger alerts—for example, when daily article production drops more than 20% compared to the previous period—enabling rapid response from both technical and product teams.

In summary, Apollo integrates all media‑related content services, reduces the development effort for new content types to about 10% of the original cost, and delivers ready‑to‑use, high‑quality data to downstream teams. Future work will focus on balancing the stringent requirements of safety‑review pipelines with the need for rapid content delivery.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend Architecturedata pipelineKafkaTiDBWorkflow Enginecontent processing
NetEase Media Technology Team
Written by

NetEase Media Technology Team

NetEase Media Technology Team

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.