One‑Stop Big Data Platform Construction: Practices from WeBank, Beike, and iQIYI
This article shares practical notes on building a one‑stop big data platform, outlining essential functions such as data extraction, cleaning, storage, analysis, governance, and security, and presents implementation case studies from WeBank, Beike, and iQIYI to illustrate real‑world architectures and solutions.
This note records the author’s experience of building a department‑wide data and operations platform from scratch, emphasizing the explosive growth of structured, semi‑structured, and unstructured data and the need for a unified big data platform.
The article draws on publicly available cases and aims to provide a roadmap for constructing an end‑to‑end data platform.
A one‑stop big data platform should offer complete data governance—including extraction, cleaning, storage, analysis, sharing, security, and operational monitoring—so users can efficiently leverage data for core systems and accelerate business innovation.
One‑stop data governance: data warehouse construction, diverse extraction task scheduling, business and real‑time data ingestion, cleaning, and storage, plus efficient data retrieval for varied query needs.
Data lineage analysis: ensures quality of data aggregation and traceability, while also assessing data value.
Smart data catalog: customizable cataloging, tagging, rapid asset search, and a global knowledge base for business understanding.
Data visualization: a visual grammar product that enables drag‑and‑drop analysis without programming.
Data privacy: masking rules to transform sensitive information and protect privacy.
The following sections illustrate typical implementations.
WeBank One‑Stop Big Data Platform Construction
WeBank needed a platform that supports massive data volumes, a seamless user experience, financial‑grade reliability and security, self‑control, and low cost.
WeBank built the WeDataSphere suite, unifying compute and storage engines and developing proprietary tools such as the Linkis middleware for unified computation entry and scheduling.
Linkis isolates storage/compute engines from application clients, handling permission control, multi‑tenant isolation, multi‑engine support, and elastic resource scaling, while sharing user permissions, variables, and functions across tools, greatly improving development and operation efficiency.
Operations and management are integrated into the Managis component, which handles underlying tool maintenance, monitoring aggregation, cluster deployment, scaling, and automated fault handling.
The platform supports a wide range of banking scenarios, from offline risk analysis to real‑time fraud detection, transaction queries, operational reporting, batch reconciliation, and regulatory reporting.
For data‑warehouse workloads, WeBank also supports business analytics, customer profiling, model training, and integrates SAS‑compatible tools via the self‑developed QuickML platform.
Beike One‑Stop Data Development Platform Practice
Beike’s platform has evolved through multiple iterations into a comprehensive solution covering data management, integration, scheduling, quality, and external data services.
Data management provides a unified metadata model, asset management, and lifecycle coverage, offering data entry, navigation, search, and lifecycle control.
Data integration quickly brings unconnected data sources into the platform, supporting MySQL, Oracle, SQL Server, TiDB, MongoDB, Kafka, and covering over 99% of business data ingestion scenarios via configurable automation.
Data quality features include robust task monitoring and alerting, while the platform also offers data subscription, exchange, and metric services, with future plans for asset management and encryption masking.
iQIYI One‑Stop Data Middle Platform Construction
iQIYI’s data middle platform addresses pain points in production, unified data warehousing, and big‑data capabilities (development, governance, services).
The architecture integrates these three dimensions, focusing on development (platformized, visualized data development), operations (task management, stability, timeliness monitoring), quality (validation to avoid issues), and governance (audit, lineage monitoring).
Various ingestion methods are provided to support diverse application scenarios.
In summary, modern data science has evolved from isolated data‑warehouse and development platform construction to a hybrid, integrated architecture that delivers end‑to‑end data extraction, cleaning, storage, analysis, sharing, security, and operational monitoring.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
