Industry Insights 19 min read

Building Baidu Waimai’s Big Data Platform: Governance & Team Insights

This article examines how Baidu Waimai designed and evolved its big data platform, comparing traditional BI to modern 4V‑driven architectures, detailing database choices, OLTP/OLAP trade‑offs, data‑analysis team structures, and the essential steps for data governance and platformization.

Baidu Waimai Technology Team
Baidu Waimai Technology Team
Baidu Waimai Technology Team
Building Baidu Waimai’s Big Data Platform: Governance & Team Insights

1. Product Features of a Big Data Platform

In internet enterprises, big data platforms have become a trend and are often delivered as cloud services. The author shares practical experience of building such a platform, starting with a comparison between traditional BI platforms—focused on structured statistical analysis—and modern big data platforms characterized by the 4V attributes (Volume, Variety, Value, Velocity).

Database rankings from DB‑Engines show relational databases still dominate the top‑10, but document stores, key‑value stores, graph databases, search engines, and wide‑column stores also hold significant positions, reflecting the shift from pure relational systems to diverse NoSQL and MPP engines.

2. OLTP vs. OLAP

Transactional (OLTP) workloads emphasize high‑concurrency, ACID compliance, and short‑lived operations, while analytical (OLAP) workloads focus on complex queries, distributed collaboration, and CAP trade‑offs (Consistency, Availability, Partition tolerance). The table below summarizes key differences:

Users: Thousands for OLTP, hundreds for OLAP

DB Size: TB for OLTP, TB‑PB for OLAP

Access: Latest column data for OLTP, multi‑dimensional aggregation for OLAP

Design: Application‑oriented, normalized for OLTP; topic‑oriented, denormalized for OLAP

Function: Daily operations for OLTP; decision support for OLAP

3. Data‑Analysis Team Forms

The author outlines five typical organizational patterns for data‑analysis teams:

A. Embedded in Business Team: Close to business needs but limited technical depth; suitable for small companies.

B. Embedded in Technical Team: Strong on data pipelines and metrics, but weaker on business context; requires tight collaboration with developers.

C. Independent Team: Acts as a bridge, aligning with corporate strategy and providing specialized analytics expertise.

D. Distributed Across Technical and Business Teams: Balances technical and business demands but may suffer from fragmented ownership.

E. Hybrid Independent + Distributed: Large‑scale structure that enables knowledge sharing and comprehensive support, common in big internet firms.

These structures influence how data is collected, transformed, and delivered to stakeholders.

4. Data Governance Practices

Effective governance starts with data standardization and a metadata dictionary to unify definitions across systems. The author stresses the need for:

Standardized data models and calculation rules (e.g., consolidating “order count”, “order quantity” into a single metric).

Integrated data architecture that supports ETL, ODS, and data‑warehouse layers, with attention to partitioning, column/row storage, hot‑cold data handling, and indexing.

Data‑mart platforms that manage metadata, quality, lineage, access control, and release signals.

Governance also requires tools for data lineage tracing, enabling root‑cause analysis of data dependencies and supporting both operational and analytical workloads.

5. Summary – Technical Guidance for Platformization

Building a big data platform should be a deliberate, technology‑driven effort rather than a series of patches. A well‑designed platform boosts productivity, enables efficient storage, scheduling, and consumption of massive data, and ultimately delivers value (the “V” in 4V). Platformization and openness turn data into a strategic asset that drives business growth.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataData PlatformOLAPData GovernanceData ArchitectureTeam Structure
Baidu Waimai Technology Team
Written by

Baidu Waimai Technology Team

The Baidu Waimai Technology Team supports and drives the company's business growth. This account provides a platform for engineers to communicate, share, and learn. Follow us for team updates, top technical articles, and internal/external open courses.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.