Building Baidu Waimai’s Big Data Platform: Governance & Team Insights
This article examines how Baidu Waimai designed and evolved its big data platform, comparing traditional BI to modern 4V‑driven architectures, detailing database choices, OLTP/OLAP trade‑offs, data‑analysis team structures, and the essential steps for data governance and platformization.
1. Product Features of a Big Data Platform
In internet enterprises, big data platforms have become a trend and are often delivered as cloud services. The author shares practical experience of building such a platform, starting with a comparison between traditional BI platforms—focused on structured statistical analysis—and modern big data platforms characterized by the 4V attributes (Volume, Variety, Value, Velocity).
Database rankings from DB‑Engines show relational databases still dominate the top‑10, but document stores, key‑value stores, graph databases, search engines, and wide‑column stores also hold significant positions, reflecting the shift from pure relational systems to diverse NoSQL and MPP engines.
2. OLTP vs. OLAP
Transactional (OLTP) workloads emphasize high‑concurrency, ACID compliance, and short‑lived operations, while analytical (OLAP) workloads focus on complex queries, distributed collaboration, and CAP trade‑offs (Consistency, Availability, Partition tolerance). The table below summarizes key differences:
Users: Thousands for OLTP, hundreds for OLAP
DB Size: TB for OLTP, TB‑PB for OLAP
Access: Latest column data for OLTP, multi‑dimensional aggregation for OLAP
Design: Application‑oriented, normalized for OLTP; topic‑oriented, denormalized for OLAP
Function: Daily operations for OLTP; decision support for OLAP
3. Data‑Analysis Team Forms
The author outlines five typical organizational patterns for data‑analysis teams:
A. Embedded in Business Team: Close to business needs but limited technical depth; suitable for small companies.
B. Embedded in Technical Team: Strong on data pipelines and metrics, but weaker on business context; requires tight collaboration with developers.
C. Independent Team: Acts as a bridge, aligning with corporate strategy and providing specialized analytics expertise.
D. Distributed Across Technical and Business Teams: Balances technical and business demands but may suffer from fragmented ownership.
E. Hybrid Independent + Distributed: Large‑scale structure that enables knowledge sharing and comprehensive support, common in big internet firms.
These structures influence how data is collected, transformed, and delivered to stakeholders.
4. Data Governance Practices
Effective governance starts with data standardization and a metadata dictionary to unify definitions across systems. The author stresses the need for:
Standardized data models and calculation rules (e.g., consolidating “order count”, “order quantity” into a single metric).
Integrated data architecture that supports ETL, ODS, and data‑warehouse layers, with attention to partitioning, column/row storage, hot‑cold data handling, and indexing.
Data‑mart platforms that manage metadata, quality, lineage, access control, and release signals.
Governance also requires tools for data lineage tracing, enabling root‑cause analysis of data dependencies and supporting both operational and analytical workloads.
5. Summary – Technical Guidance for Platformization
Building a big data platform should be a deliberate, technology‑driven effort rather than a series of patches. A well‑designed platform boosts productivity, enables efficient storage, scheduling, and consumption of massive data, and ultimately delivers value (the “V” in 4V). Platformization and openness turn data into a strategic asset that drives business growth.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baidu Waimai Technology Team
The Baidu Waimai Technology Team supports and drives the company's business growth. This account provides a platform for engineers to communicate, share, and learn. Follow us for team updates, top technical articles, and internal/external open courses.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
