Baidu Log Platform: Ensuring Data Accuracy with No-Duplication and No-Loss Architecture
Baidu’s logging platform centralizes data collection, transmission, management, and analysis for billions of daily logs, employing a layered architecture with priority persistence, service decomposition, stream computing, and client‑side optimizations to guarantee no duplication, no loss, and 99.99%+ stability.
This article introduces Baidu's logging platform (日志中台), a one-stop service for tracking data that manages the complete lifecycle of logging data, enabling quick completion of data collection, transmission, management, and query analysis for product operations analysis, R&D performance monitoring, and operations management.
Platform Overview: The logging platform covers most key products within Baidu, including Baidu App, mini-programs, and matrix apps. It handles billions of log entries daily with peak QPS reaching millions per second and maintains 99.99% service stability.
Core Challenge - Data Accuracy: The platform's most critical challenge is ensuring data accuracy, which can be divided into two parts: (1) No-duplication: preventing data duplication from system-level retries and architecture exception recovery; (2) No-loss: preventing data loss from system failures and code bugs.
Architecture Solutions:
Log Priority Persistence: The access layer prioritizes data persistence before business processing to prevent data loss from server failures.
Service Decomposition: Breaking down the monolithic logging server into specialized layers: access layer (data persistence), fan-out layer (flexible data distribution), and business layer (custom processing).
Stream Computing: Using stream computing architecture to ensure end-to-end no-duplication and no-loss. Each log entry receives a unique identifier (MD5), and business flow filter operators perform global deduplication.
Client-side Optimization: Improving data reporting timing through scheduled tasks, business trigger携带, and threshold-based triggers to minimize local cache time.
Technical Stack: The platform utilizes stream computing architecture, supports multiple data output methods including real-time streaming (RPC), quasi-real-time streaming (message queues), and offline batch processing, achieving 99.995% service stability.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.