Big Data 19 min read

Business Observability and Real-Time Event Streaming Architecture for Content Production

The paper proposes a business‑observability framework for a content‑production pipeline—illustrated by Bilibili’s workflow—by modeling archives as entities, assigning global AIDs for end‑to‑end tracing, and leveraging a Kafka‑Flink‑ClickHouse event‑streaming platform to monitor real‑time latency, bottlenecks, and safety audits across the entire production line.

Bilibili Tech

Oct 11, 2024

Business Observability and Real-Time Event Streaming Architecture for Content Production

This document introduces the need for business observability in a content production platform, using Bilibili's workflow as a concrete example. It describes the end‑to‑end process from creator inspiration, video editing, upload, transcoding, content safety review, to final CDN distribution, and draws an analogy to a Manufacturing Execution System (MES) to highlight the importance of real‑time tracking and scheduling.

The top‑level design defines "business observability" as the monitoring of business entities (archives) and events throughout their lifecycle, rather than merely technical metrics such as QPS or latency. It emphasizes the need to observe the business layer (the "goods" on the production line) to detect bottlenecks, resource contention, and abnormal user behavior.

Key business entities are modeled as an Archive aggregate consisting of Party/Place/Thing, Role, Description, and Moment‑Interval. The document references Peter Coad’s four‑color modeling method to illustrate how static and temporal aspects of an entity are captured.

To achieve end‑to‑end traceability, a custom business trace mechanism is proposed. A global identifier (AID) serves as the trace ID linking client‑side events (captured before the archive is created) with server‑side events. The trace flow includes mapping clientTraceID → serverTraceID → AID.

Event collection is performed via an event‑streaming pipeline. All sub‑systems emit events that are aggregated into a real‑time event platform built on Kafka, Flink, and ClickHouse. This enables snapshot reconstruction, duration calculation, and statistical analysis (zoom‑out for aggregate metrics, zoom‑in for single‑archive trace).

Sample tracing code (illustrating OpenTracing‑style span creation) is shown below:

parentSpan:=tracer.StartSpan("稿件创作") //step 1

parentSpan=tracer.StartSpan("获取素材", opentracing.ChildOf(parentSpan.Context())) //step 2

parentSpan=tracer.StartSpan("上传视频", opentracing.ChildOf(parentSpan.Context())) //step 3

parentSpan=tracer.StartSpan("稿件预检", opentracing.ChildOf(parentSpan.Context())) //step 4

parentSpan=tracer.StartSpan("提交稿件", opentracing.ChildOf(parentSpan.Context())) //step 5

On the server side, the trace IDs are mapped and reported:

ts.Trace(c, ts.EventTraceName("审核通过"), ts.EventKey(te.CreativeArchiveAuditEditEvent), ts.EventNodeType(ts.Start), ts.EventLevel(ts.Info), ts.AID(aid))

The platform provides real‑time dashboards for event latency percentiles, event loss detection, and safety audit reconciliation. Alerting integrates with a monitoring system (Prometheus) and an alarm platform to notify responsible teams of anomalies such as missing safety reviews or delayed openings.

Statistical queries (e.g., using window functions) are used to compute per‑archive durations and percentile distributions. Example SQL snippet:

select aid, lagInFrame(event_time) over win as previous_time, event_time from xxx where xxx window win as (partition by aid order by event_time, event_key)

Finally, the document summarizes that a global event‑sourcing platform gives precise insight into production efficiency, supports cross‑domain observability, and can be extended to various business scenarios beyond content production.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

real-time analytics Traceability Content Production business observability event streaming

Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.