Big Data 14 min read

Design and Implementation of a Content Revenue Settlement System

The article details the design and implementation of a content revenue settlement platform that aggregates traffic and ad data, uses a Spark‑plus‑PALO architecture for processing tens of millions of daily records, and employs a master‑worker model with idempotent tasks, temporary tables, and verification steps to ensure reliable monthly profit‑share calculations for authors, media, mini‑program owners, and users.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Design and Implementation of a Content Revenue Settlement System

This article introduces the architecture and implementation details of a content revenue settlement platform, which converts article traffic and advertising contributions into profit shares for authors, media, mini‑program owners, and users.

Business Overview : The platform aggregates data from content producers (e.g., Baijiahao), traffic distributors (e.g., Handheld Baidu), and advertising systems (e.g., Fengchao). It calculates profit shares based on a core strategy model and generates monthly settlement bills for downstream payment systems.

Three Core Functions :

Settlement Model – processes daily profit and subsidy details from the data middle‑platform, performs multi‑dimensional aggregation (daily calculation, monthly summarization, accrual, provision, and bill generation) and outputs monthly settlement statements.

C‑End Content Trading Platform – provides authors with a UI to view estimated daily earnings, distribution metrics, and final monthly invoices.

O‑End Management Platform – supports compliance, fund control, invoice auditing, blacklist management, and anti‑fraud operations.

Key Terminology :

PALO – Baidu’s MPP cloud data warehouse built on Apache Doris, used for online analytical queries.

BNS (Baidu Naming Service) – service‑name to instance mapping system that supports service discovery, IP whitelisting, load balancing, etc.

System Architecture : The platform sits between the data middle‑platform and downstream services. It consumes three detailed offline files (daily profit detail, daily subsidy detail, daily distribution detail) generated by upstream business lines.

Technical Challenges and Solutions :

Processing Tens of Millions of Daily Records – Two solutions were evaluated: DB‑based batch processing on a distributed relational database (DDBS) with sharding by user UID. Spark‑based offline computation combined with PALO for storage. The Spark + PALO solution was chosen for its native distributed processing and superior query performance on large data volumes.

Monthly Million‑Level Tasks – Implemented a master‑worker model. The master node backs up balance tables, determines UID ranges, and dispatches tasks to worker nodes via BNS. Workers fetch UID ranges from Redis, generate job records, apply filtering rules, compute settlement amounts, and write results to temporary tables.

Task Idempotency and Reliability – Jobs are persisted with status flags; failed inserts trigger alerts. After all workers finish, a confirmation step scans the job table for unfinished tasks, re‑executes missing work, and validates that processed UID counts match expected totals.

Data Consistency and Write‑Read Separation – Monthly results are first written to temporary tables for verification. Once totals are validated, data is moved to the production database, minimizing write pressure on the live system.

Conclusion : The article summarizes the major technical points of the settlement system, emphasizes the importance of operational reliability, and outlines future directions toward automated, intelligent operations and scalable architecture evolution.

Big DataDistributed ProcessingSparkcontent revenuePalosettlement system
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.