Big Data 21 min read

Beike's Data Development Platform: Evolution, Architecture, and Future Outlook

The talk by Beike senior engineer Yang Zongqiang details the evolution of the company's data development platform, covering background, three architecture upgrades, platform features such as metadata management, data integration, scheduling, quality assurance, and future directions for building an enterprise‑grade big‑data system.

DataFunTalk
DataFunTalk
DataFunTalk
Beike's Data Development Platform: Evolution, Architecture, and Future Outlook

01 Background

Initially Beike's data volume was small and business teams handled data needs themselves. As business grew, data requirements became complex, leading to the establishment of a big‑data department in 2014 to study and develop data solutions, focusing on property, user, and behavior data.

Property data: building a dictionary since 2008, now over 200 million property records.

User data: buyers, tenants, owners, agents, and later brand and renovation personnel.

Behavior data: online browsing and offline viewing activities.

Design principles: cost reduction, efficiency improvement, and standardization.

02 Exploration Journey

First stage (2014) used Hadoop ecosystem (Hadoop, Hive, Sqoop) with a layered data‑warehouse model (ingestion, warehouse, reporting). This approach suffered from custom development inflexibility, simple scheduling (Zeus+Python+Shell) and data‑security issues.

Subsequent platform‑ization introduced a data‑management platform and an Ad‑hoc query platform, integrating metadata management, data quality, security, and a unified scheduling system.

Advantages: resolved warehouse bottlenecks, enabled business‑driven data product development, and provided fast, visualized troubleshooting.

Remaining challenges: increased task load, resource contention, and difficulty controlling data‑development quality.

03 Platform Overview

Data Management

Unified metadata model covering relational, non‑relational, log, and semi‑structured data.

End‑to‑end data lineage and asset management.

Capability to expose data assets via data maps and APIs.

Data Integration

Supports MySQL, Oracle, SQL Server, TiDB, MongoDB, Kafka, etc., achieving >99% coverage of business data ingestion with configurable, automated pipelines, including data migration and split scenarios.

Job Scheduling

Provides visual workflow configuration, dependency management, alerting, and scheduling algorithms to prioritize critical jobs and reduce mean‑time‑to‑recovery from hours to minutes.

Data Quality

Implements SQL syntax validation, execution‑plan analysis, runtime monitoring, timeliness checks, and accuracy verification to ensure reliable data delivery.

Data Open‑Access

Offers self‑service Ad‑hoc queries, BI visualizations, API‑based data services, and change‑notification mechanisms to deliver data to downstream applications and users.

04 Summary & Outlook

Asset‑based data management and full‑link tracking enhance data value.

Encryption, masking, and sensitive‑data monitoring protect data throughout its lifecycle.

Standardized components form a reusable enterprise‑grade big‑data platform.

Future work includes integrating IDE capabilities, advanced data‑governance, and AI‑driven management to further improve development efficiency and security.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

metadataData QualityData Platform
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.