Big Data 15 min read

How a Young B2B Startup Built Its Big Data Platform from Scratch

This article shares Fenbeitong’s practical experience building a big‑data platform for a young B2B company, covering company background, data‑team formation, technology selection, architecture design, governance processes, modeling tools, batch and real‑time modeling, and insights on ToB versus ToC technical choices.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How a Young B2B Startup Built Its Big Data Platform from Scratch

Company Introduction

Fenbeitong, a six‑year‑old B2B company focused on enterprise expense management, built a unified platform to handle travel, accommodation, and other corporate spending. By integrating suppliers and automating budgeting, approval, transaction, and reimbursement, the platform reduces manual work for both employees and finance teams.

The company now serves thousands of customers and is considered a unicorn in the enterprise services market.

Big Data Construction Background

The big‑data department was created only a year ago; previously, product teams handled data, leading to long delivery cycles (1‑2 months per feature). Rapid growth and increasing data demands made a dedicated data team essential.

Three internal groups—business (sales, marketing, customer success), operations‑product (mall, travel, expense control, payment), and functional (R&D, HR, finance)—have distinct data needs across the customer journey stages of awareness, education, selection, usage, and upsell.

Technology Selection

Given a small team, limited budget, and cloud‑first strategy, three options were evaluated: (1) Alibaba Cloud stack (MaxCompute + Hologres + Flink + DataWorks), (2) EMR on the cloud, and (3) a self‑built Hadoop cluster. The Alibaba solution was chosen for its managed services despite higher cost.

Big Data Construction Plan

The architecture follows a classic Lambda model: multi‑source ingestion, data cleaning, offline and real‑time storage, a data warehouse, and an application layer, complemented by governance processes.

Two specialized teams were formed: a data‑warehouse team handling ODS/DWD layers, and a data‑analysis team responsible for DWS/ADS layers. This separation improves focus on data quality versus business analytics.

DataWorks intelligent modeling is used to generate physical tables and ETL code, enforcing standards and reducing ad‑hoc SQL modeling.

Key components include MaxCompute for batch processing, Hologres for real‑time analytics, MySQL and Elasticsearch for auxiliary data, and a strategy of swapping temporary tables to update Hologres without service disruption.

Modeling and Algorithms

Batch offline models are built with Alibaba PAI. Data is cleaned in MaxCompute, then models are trained and either used for large‑scale offline scoring or deployed online via OSS and EAS.

Real‑time models (e.g., recommendation) use Flink and Kafka. Offline models are converted to real‑time versions, trained, evaluated, and then served online.

ToB vs. ToC Technical Differences

ToB customers have longer decision cycles, higher usage compliance, and require highly customized solutions, leading to smaller data volumes (TB‑scale) but complex modeling needs. ToC products face massive user bases, simpler workflows, and larger data volumes, making infrastructure choices riskier.

Consequently, ToB projects can iterate quickly with lower‑cost stacks, while ToC projects demand more robust, scalable architectures.

Future Outlook

Fenbeitong aims to expand digital‑transformation scenarios, deepen data‑mid‑platform integration, and shift from reporting to intelligent product support. Technical goals include higher real‑time data demands, HTAP exploration, lake‑house integration for unstructured data, and unifying batch‑stream processing to reduce operational costs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud computingdata modelingData WarehouseToB
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.