Building a Big Data Platform at FenbeiTong: Architecture, Practices, and Lessons Learned
This article shares FenbeiTong's experience in building a big data platform, covering company background, data construction challenges, technology selection, architecture design, implementation details, data modeling tools, and real-world application scenarios such as CDP and CEM, offering practical insights for similar enterprises.
01 Company Introduction
FenbeiTong is a young B2B enterprise focused on solving corporate travel and expense management pain points. It provides an end‑to‑end platform that streamlines budgeting, approval, transaction, and reimbursement processes, reducing manual work for both employees and finance teams. After six years of growth, the company serves thousands of customers and is considered a unicorn in the enterprise services space.
02 Big Data Construction Background
The company's data team was established only a year ago; previously, data support was scattered across product development teams, leading to long delivery cycles for new features. As the business scaled, the demand for reliable data grew rapidly, prompting the need for a dedicated big data platform.
The customer journey was broken into six stages—awareness, education, selection, payment, usage, and upsell—highlighting the varied data needs across business, operations, and functional departments.
03 Big Data Construction Solution
Three architectural options were evaluated: (1) Alibaba Cloud’s MaxCompute + Hologres + Flink + DataWorks, (2) an open‑source EMR stack, and (3) building a self‑managed Hadoop cluster. The team chose the first option for its lower operational overhead despite higher cloud costs.
The resulting architecture follows a Lambda model with separate offline and real‑time layers, using MaxCompute for batch processing, Hologres for real‑time serving, and auxiliary MySQL/Elasticsearch for auxiliary data. DataWorks is used for intelligent data modeling, generating physical tables and ETL code automatically.
Key practices include separating the data warehouse (ODS/DWD) from data analysis (DWS/ADS) teams, establishing SOP‑driven governance processes, and adopting a “build‑while‑govern” approach rather than post‑hoc governance.
04 Big Data Application Scenarios
Two flagship use cases are presented:
Customer Data Platform (CDP): consolidates public and private data sources, unifies customer IDs, and builds layered profiles (static, semi‑dynamic, behavioral) with tags (basic, rule‑based, mined) to enable personalized marketing and analytics.
Customer Experience Management (CEM/VOC): collects multi‑channel feedback, applies AI techniques (ASR, NLP) to analyze sentiment and issue severity, and scores touchpoints to provide actionable insights for product, support, and operations teams.
The article also contrasts B2B and B2C data characteristics, noting that B2B data volumes are typically in the terabyte range, requiring high‑quality modeling and rapid job turnaround, while B2C demands massive scale and different architectural considerations.
Future directions include deeper real‑time and HTAP capabilities, lake‑warehouse integration, handling unstructured data (audio, text), and further unifying batch‑stream processing to reduce operational complexity.
Overall, the case study provides practical guidance for enterprises embarking on big data platform construction, covering strategic decisions, technical stack selection, governance, and real‑world applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
