How Baidu’s Noah Platform Unifies Ops Data with Pull, Push, and Lazy ETL
This article explains how Baidu Cloud's Noah intelligent operations product builds a unified operations knowledge base by categorizing metadata, status, and event data and applying three ETL approaches—Pull, Push, and Lazy—to handle offline, near‑line, and real‑time data integration.
Overview
During the continuous evolution of Baidu's intelligent operations, a robot‑centric capability is being built for fault self‑healing, root‑cause analysis, and smart changes. The foundation is a unified operations worldview (environment model) that lets the robot perceive system state and environmental changes.
Traditional operations store data in disparate systems, leading to inconsistent access methods, terminology, concepts, and lack of data relationships, which raises operational costs and hampers efficiency. A unified operations knowledge base is proposed to standardize language, model objects, and collect daily operational resources.
Data in the Operations Knowledge Base
The knowledge base contains three data types:
Meta : models the operational entity world, including attributes, composition, and relationships.
Status : reflects system state, such as service liveness, resource consumption, or capability.
Event : describes changes to the system and abnormal service states.
ETL System Architecture
Operational data is scattered across dozens of systems, causing three main problems:
Data is dispersed with inconsistent access methods.
Terminology, concepts, and models differ across systems.
No data relationships exist between systems, making correlation difficult.
To address these, a unified knowledge base is built using an ETL pipeline that extracts data, transforms it into a common schema, and loads it into the repository.
Based on data timeliness requirements, three ETL modes are used:
Pull ETL : periodic extraction for offline data.
Push ETL : source pushes high‑frequency changes for near‑line data.
Lazy ETL (Federation) : on‑demand query‑time fetching for real‑time data.
Pull ETL
Two ingestion methods are provided: adaptive ETL and SDK‑based custom ETL. Adaptive ETL automatically parses user‑defined rules for common sources (e.g., Baidu Name Service, Noah monitoring, Noah deployment). SDK‑based custom ETL allows developers to write scripts for other sources.
Push ETL
Push ETL uses a message queue (MQ) for high‑timeliness data. Sources push change messages to MQ; the knowledge base subscribes, consumes, transforms, and stores the data.
Lazy ETL
Lazy ETL serves real‑time queries by federating calls to original data sources, converting results to the unified schema on the fly, avoiding the latency of Pull and the overhead of Push.
Conclusion
The article presented Baidu Cloud Noah's operations knowledge base and its ETL strategies. Pull ETL handles offline data, Push ETL addresses high‑timeliness data, and Lazy ETL supports real‑time queries. Different ETL methods are chosen based on business scenarios and data freshness requirements.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
