Big Data 10 min read

How Data Empowers the Fast‑Moving Consumer Goods Industry: Baicaowei’s End‑to‑End Data Platform Evolution

This article details Baicaowei’s journey from a Hadoop‑based data platform to a modern StarRocks‑driven architecture, illustrating how digitalization, evolving business needs, and streamlined data pipelines empower the fast‑moving consumer goods sector through efficient data collection, modeling, and analytics.

DataFunTalk
DataFunTalk
DataFunTalk
How Data Empowers the Fast‑Moving Consumer Goods Industry: Baicaowei’s End‑to‑End Data Platform Evolution

Speaker Zhu Qitian, head of Baicaowei’s Data Department, shares the theme “How to Use Data to Empower the Fast‑Moving Consumer Goods (FMCG) Industry and Enhance Business Capability.” FMCG is characterized by rapid turnover, a young audience, high frequency, broad reach, and low per‑order value. Baicaowei, a snack brand now under PepsiCo, operates an omnichannel model and increasingly relies on data to drive rational, standardized, and efficient operations.

The presentation is divided into four parts: (1) Insights from data platform evolution, (2) Technology growth driven by business changes, (3) Concept‑first vs. problem‑driven approaches, and (4) Future data architecture vision.

1. Data Platform Evolution Insights

2017: Built a CDH‑based big‑data platform (CDH 5.15.0) with basic functions, hundreds of reports, real‑time reporting via Stream Computer and Quick BI, and offline reporting using Hive.

2019: Continued with CDH, upgraded to 6.3.2, switched real‑time processing to Spark and Kafka, and created internal tools for developers.

2020: After acquisition by PepsiCo, on‑premise data centers were decommissioned; migrated to the cloud using Databricks, Data Lake Formation, and OSS object storage.

2022: To meet higher compute demands, adopted StarRocks and CloudCanal.

Two platform models were compared:

Hadoop‑based: long data pipelines, many components, repeated data copies, high maintenance.

StarRocks‑based: short pipelines, high efficiency, developers can focus on business logic.

2. Digital Process

The digitalization journey moves from informationization to digitalization and finally to a unified data platform with visual analytics. Early concepts such as data warehouses evolved into data lakes and data middle‑platforms, with data warehouses forming the foundation.

3. Early Data Architecture

Challenges included frequent schema changes on the ODS layer, requiring flexible logical changes on DWD/DWS without physical data updates. Delta Lake was introduced to handle frequent updates and small‑file merging, though it increased ODS storage requirements.

4. New Data Architecture

Traditional Hadoop components suffered from redundancy, strong coupling, immature data‑lake products, high maintenance cost, and low compute efficiency. By simplifying the stack to StarRocks, Baicaowei achieved a three‑component pipeline (collection, storage/computation, visualization). CloudCanal synchronizes Kafka, MongoDB, and Redis data into StarRocks, providing:

Convenient data collection.

Developers focus on business rather than infrastructure.

Reduced storage cost by using views for many tables.

5. Common Big‑Data Components

As business complexity grows, multiple components (e.g., ingestion, processing, storage, analytics) become necessary.

6. Technology Growth from Business Changes

Rapidly changing business requirements demand flexible data models and scalable components. Baicaowei built a full‑chain digital project to create a business panorama, leveraging StarRocks to cut processing time by 10‑100× and enable business‑centric analytics.

7. Concept‑First vs. Problem‑Driven

Initially, concepts drove product development, leading to many overlapping solutions and high coupling. The team now emphasizes problem‑driven development to reduce trial‑and‑error costs and focus on tangible business improvements such as automated financial reconciliation and efficient inventory control.

8. Future Data Architecture Vision

Five improvement directions for StarRocks were identified:

Unified management UI for multi‑source incremental ingestion, data governance, job scheduling, and monitoring.

Flexible, stable multi‑table materialized views.

Logical and physical unified data models to reduce warehouse depth.

Primary‑Key model with persistent indexes, enhancing catalog capabilities.

Better support for high‑frequency writes via a buffer layer, boosting ingestion, computation, and real‑time analytics.

The presentation concludes with thanks and a call for audience engagement.

Big DataStarRocksdigital transformationdata architectureconsumer goods
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.