Operations 13 min read

How Baidu Scales Real‑Time Log Monitoring for Billions of Events

This article explains Baidu's log‑center architecture for handling billions of UBC events per day, detailing UBC concepts, monitoring requirements, a low‑cost scalable design with dimension mapping, watermarking, data trimming and time‑window aggregation, and the resulting performance and cost benefits.

Architect
Architect
Architect
How Baidu Scales Real‑Time Log Monitoring for Billions of Events

Introduction

The Baidu log‑center processes petabyte‑scale user‑behavior collection (UBC) traffic from most of its mobile products, supporting real‑time monitoring that is essential for traffic spikes, business iteration, and billing.

UBC Concept & Types

UBC (User Behavior Collection) is Baidu's main data‑collection protocol. Logs are classified into three categories: UBC client logs, UBC server logs, and UBC H5 logs. Each log carries a UBC ID that distinguishes different user actions.

Two logging styles exist:

Event logging records a single action (e.g., a click) and is usually counted by PV.

Stream logging records a continuous action (e.g., video watch duration) and includes a duration field; both PV and duration are used for statistics.

Monitoring Requirements

Effective monitoring must satisfy:

Minute‑level latency to view trend data.

PV as the metric for event logs; PV + total duration for stream logs.

Support for arbitrary combinations of public and business parameters as filter dimensions.

Overall Architecture

Client SDKs send logs to dedicated ingestion servers, which perform lightweight parsing and write raw logs to storage. Afterward, a log‑transport component moves the raw logs to a message queue, where streaming jobs consume them for downstream use.

Design of Monitoring Architecture

The design avoids tight coupling with online services and keeps business logic out of the ingestion layer. Instead, a streaming task extracts dimensions, applies mapping rules, and produces a lightweight monitoring dataset.

4.1 Avoiding Dimension Explosion

Only a limited set of public parameters are reported automatically. Business‑specific parameters are defined per UBC ID and mapped to at most six custom filter dimensions using either 1‑to‑1 or many‑to‑1 mapping, preventing unlimited column growth.

4.2 Watermark‑Based Monitoring

Traditional online‑service monitoring uses wall‑clock time, which suffers from data‑delay shifts when the upstream service stalls. Baidu adopts the watermark concept from stream processing: the watermark is a timestamp indicating that all data earlier than it has been fully processed. Monitoring uses the log‑reporting time as the horizontal axis and the watermark as the baseline, ensuring that statistics before the watermark are stable.

4.3 Reducing Monitoring Cost

Raw logs average 10 KB per entry; after dimension mapping and trimming, each entry shrinks to ~0.2 KB, a 98 % reduction. The trimmed data is then aggregated in 5‑minute windows, keeping only count (PV) and sum(duration). This reduces the daily monitoring record count from hundreds of millions to under 100 k, a 99.98 % compression.

Conclusion & Outlook

The log‑center provides a one‑stop solution for observing user behavior, delivering minute‑level, highly customizable monitoring while keeping storage and compute costs low through dimension mapping, watermarking, data trimming, and time‑window aggregation. Future work will continue to simplify the architecture and improve reliability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Real-time StreamingWatermarkCost OptimizationLog Monitoringdimension mappingUBC
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.