Operations 20 min read

How Baidu’s Log Platform Cuts Billions in Cost with Full‑Lifecycle Event Governance

This article details Baidu's log platform point‑governance practice, explaining why uncontrolled event logging inflates storage and compute costs, and describing a three‑stage solution—manual, semi‑automatic platform, and full‑lifecycle standardization—that uses anomaly detection, automated workflows, and IM bots to achieve massive PV reduction and annual cost savings.

Baidu Geek Talk

Apr 28, 2025

How Baidu’s Log Platform Cuts Billions in Cost with Full‑Lifecycle Event Governance

In Baidu’s ecosystem, "point" (打点) refers to embedded statistical code that records user actions such as clicks and swipes, generating massive logs used for reporting, A/B testing, and personalization. Daily, billions of point logs are produced, consuming hundreds of petabytes of storage and incurring high compute costs.

Problem Analysis

As business iterates, point logs continuously grow in volume and length, leading to unstable point services, increased storage, and compute demands. Key challenges include locating useless points, detecting abnormal points, trimming fields, and ensuring stability during feature rollouts or high‑traffic events.

Solution Overview

The governance approach is divided into three phases:

Manual Governance : Direct communication between the log platform team and product owners to understand point usage, analyze PV spikes, and apply customized mitigation strategies.

Semi‑Automatic Platform Governance : A platform that automates the workflow, providing a DAG‑based process, four governance modes (demand surge, anomaly fix, activity traffic, point optimization), and integrates IM bots for one‑click group creation and templated notifications.

Full‑Lifecycle Standardized Governance : A standardized architecture that continuously handles point retirement, anomaly repair, redundancy reduction, and feature‑based classification, enabling repeatable, efficient interventions.

Phase Details

1.1 Manual Governance

Focuses on understanding diverse point purposes (e.g., activity, experiment, demand) and delivering flexible, user‑centric governance measures. While effective initially, scaling becomes difficult as more points and business lines are added.

1.2 Semi‑Automatic Platform Governance

Implements scheduled tasks that collect daily PV per point, apply anomaly detection algorithms, and flag abnormal points. The platform provides three status pages (to‑govern, in‑govern, completed) and records state transitions in a MySQL database for real‑time analytics and visualization. An IM robot automates group creation, sends templated alerts, and @‑mentions relevant owners to accelerate the workflow.

1.3 Full‑Lifecycle Standardized Governance

Classifies points by characteristics (single‑demand, composite, activity, cascade, experiment, framework) and applies tailored strategies. It continuously identifies useless points (no traffic or traffic without downstream usage) and removes them after a review process. It also merges redundant points, samples high‑volume logs, and fixes abnormal points through a defined detection‑confirmation‑repair pipeline.

Key Outcomes

The governance project has identified over ten potential risk points, reduced daily log reports per user by hundreds of lines, and saved millions of dollars in annual compute and storage costs by handling billions of PVs each year. It also improves point quality, supports business growth, and lays groundwork for future event‑based PV governance.

Visuals

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Platform Engineering automation Operations cost optimization log management event governance

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.