Evolution of the Beike Data Platform: From Traditional Real‑Estate Business to Integrated Big Data Solutions
This article shares how Beike, a 20‑year‑old traditional real‑estate company, built and evolved its data platform to support digital transformation, describing the business background, data characteristics, user groups, application scenarios, platform challenges, the Odin analysis platform, data‑as‑asset management, and future directions.
Guest: Zhang Qi, Head of Big Data Product at Beike
Editor: Li Caiwei
Platform: DataFunTalk
Introduction: This article explains how data products help a 20‑year‑old traditional enterprise undergo digital transformation, how the data platform evolves during this process, the difficulties encountered, and the thinking and practice behind product construction, with a focus on Beike's data platform evolution and governance.
01 – Business and Data Background
Beike is an industry‑internet company whose core business is house buying and selling. Most transactions are still offline, making the process low‑frequency and long‑cycle. Over time, many offline steps have been moved online via the Beike app and stores, covering listings, leads, agents, and stores.
From a data perspective, Beike possesses nationwide comprehensive property, listing, agent behavior, online/offline lead, and store data. The data is high‑dimensional, complex, and largely collected offline, resulting in delayed data availability (e.g., signing and viewing data are not synchronized with the system in real time).
In summary, Beike's data is multi‑type, high‑complexity, offline‑centric, and delayed.
2. Beike Data Users
Data‑application users (≈300k agents) use data for daily management and operations.
Data‑research users (data engineers, analysts) perform deep analysis, reporting, and model building across >1,000 staff in ~100 cities.
Company‑level users focus on measuring data value, building a data ecosystem, and standardizing data solutions for franchise brands.
3. Data Application Scenarios
Management: Strategic use of data to convey directives, monitor performance, and enforce standards across headquarters and cities.
Operations: Tactical, fine‑grained use of data at the store, agent, and lead level (e.g., monitoring agent viewings).
Brand: Providing standardized data services to franchise brands through integrated systems.
02 – Evolution of the Beike Data Platform
1. Past
In 2018 Zhang Qi joined Beike and took over two platforms:
Metric Platform: Built on Kylin, defining measures and dimensions, generating cubes, and allowing users to create reports based on pre‑defined metrics.
Data Management Platform: Underlying data capabilities for ingestion, processing, scheduling, and service delivery.
Problems identified: low efficiency, poor metric reuse, heavy manual SQL, lack of row‑level permissions, data silos, and security risks.
2. Issues
Efficiency: Reporting depends on pre‑built metrics, causing bottlenecks; cities lack row‑level permission, leading to duplicated effort and data islands.
Platform: Kylin cannot cover all scenarios; high technical threshold makes the platform hard to use.
Quality: Explosion of metrics and events (tens of thousands) creates maintenance burden and reduces data trust.
Security: Data can be downloaded and shared without watermarks, posing leakage risks.
3. User Experience
Users spend 70‑80% of their time on permission acquisition, data processing, and validation rather than analysis.
4. Requirements
Integrate all capabilities into a single platform to lower the barrier for users.
Move offline data processing online while ensuring quality and security.
2. Platform Evolution
① Platform Roadmap
To address efficiency, quality, and security, the company launched the Odin analysis platform at the end of 2018, eventually covering 70‑80% of cities.
② Odin Analysis Platform
Key capabilities added:
Data Ingestion: Expanded beyond Kylin to include Druid, Presto, Impala, and direct Hive/CSV access.
Data Modeling: Integrated modeling and exploration, allowing analysts to build models online.
Visualization: Drag‑and‑drop visual configuration for dashboards.
Application: Configurable mobile, portal, and cockpit outputs.
3. Data Asset Management
Row‑level permission system for fine‑grained data access.
Workspace isolation for resource control and cost management.
Rule engine to enforce query limits, cost allocation, and optimization.
Metadata management with scoring for assets, tasks, storage, and compute.
Comprehensive data monitoring and governance (pre‑, during‑, post‑processing).
IDE for development (still partially offline).
4. Data Factory
Started in 2018 with permission monitoring, then built metadata graph, data openness, workspace, and rule engine.
5. Effects
City‑level analysts gain more time for analysis, with improved efficiency, quality, and security (no local data leakage).
6. Difficulties
Historical inertia – migrating users from legacy workflows.
Organizational alignment – top‑down strategy and cross‑city coordination.
Operations – scaling data culture, standards, and governance across hundreds of cities.
3. Current State
The platform now runs on a Hadoop ecosystem, offering data ingestion, development, management, exploration, lightweight modeling, visualization, and metric capabilities, delivered via mobile apps, mini‑programs, large screens, and portals.
03 – Future Vision
Beike aims to move from basic data usage to intelligent, knowledge‑driven interactions: automated insights, risk warnings, and conversational data that guide business decisions, while also raising analyst competency and fostering data‑driven habits.
In summary, the future plan is to transform raw data into actionable knowledge, building a unified data product system for the industry‑internet context.
Thank you for listening.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
