Big Data 9 min read

Beike DMP Platform: Architecture, Implementation Challenges, and Business Impact

The article details Beike's Data Management Platform (DMP) built since May 2018, covering its overall architecture, data collection, processing, real-time profiling, storage solutions, application scenarios, achieved performance metrics, and future development directions.

Beike Product & Technology
Beike Product & Technology
Beike Product & Technology
Beike DMP Platform: Architecture, Implementation Challenges, and Business Impact

1. Background: To better understand real user needs, provide differentiated services, and achieve refined user operations, Beike launched a DMP platform in May 2018 that collects diverse user data, tags interests, and enables personalized recommendation, search, content guidance, and precise advertising or push messaging.

2. Challenges: The platform needed to unify user identities, handle massive real‑time behavior data, and achieve second‑level audience estimation and minute‑level complex audience calculations.

3. Implementation:

3.1 Overall Architecture

3.2 Data Collection Layer: Collects online and offline user behavior; a unified tracking specification ("Luopan") was introduced in early 2018 to provide a solid data foundation.

3.3 Data Processing Layer: Builds a wide‑table (topic table) to flatten data, solves user identity unification across devices (IMEI, IDFA, app‑generated IDs, UCID), and generates three types of tags – basic/behavioral, preference scores, and predictive labels via classification and clustering algorithms.

3.4 Real‑time Profiling: Uses Spark Streaming to consume behavior data, stores it in HBase wide tables, updates counts atomically, and caches real‑time preferences in Redis, achieving second‑level profiling.

3.5 Application Data Storage Layer: Utilizes ClickHouse, MongoDB, and HBase for different workloads; Spark jobs import Hive data into ClickHouse, use bitmap operations for fast audience estimation, and sync data to MongoDB for push services and to HBase/Redis for low‑latency personalized services.

3.6 Application Layer: Provides functionalities such as audience definition, audience insight, tag management, tag marketplace, look‑alike audience expansion, and push messaging, all powered by the underlying data platform.

4. Effects: The platform processes billions of daily events, delivers data within 10 am, and supports over 400 million daily API calls with average response times of 5 ms, enabling personalized services across DSP advertising, recommendation, search, homepage layout, and more.

5. Summary: After two years of iteration, Beike's DMP has become a core platform supporting a wide range of scenarios, delivering personalized services and refined operations.

6. Outlook: Future work includes deepening tag coverage with predictive models, continuously improving effectiveness, and further platformization of the DMP.

data engineeringBig DataReal-time Streamingdata platformUser ProfilingBeikeDMP
Beike Product & Technology
Written by

Beike Product & Technology

As Beike's official product and technology account, we are committed to building a platform for sharing Beike's product and technology insights, targeting internet/O2O developers and product professionals. We share high-quality original articles, tech salon events, and recruitment information weekly. Welcome to follow us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.