Big Data 9 min read

Beike DMP Platform: Architecture, Implementation Challenges, and Business Impact

The article details Beike's Data Management Platform (DMP) built since May 2018, covering its overall architecture, data collection, processing, real-time profiling, storage solutions, application scenarios, achieved performance metrics, and future development directions.

Beike Product & Technology
Beike Product & Technology
Beike Product & Technology
Beike DMP Platform: Architecture, Implementation Challenges, and Business Impact

1. Background: To better understand real user needs, provide differentiated services, and achieve refined user operations, Beike launched a DMP platform in May 2018 that collects diverse user data, tags interests, and enables personalized recommendation, search, content guidance, and precise advertising or push messaging.

2. Challenges: The platform needed to unify user identities, handle massive real‑time behavior data, and achieve second‑level audience estimation and minute‑level complex audience calculations.

3. Implementation:

3.1 Overall Architecture

3.2 Data Collection Layer: Collects online and offline user behavior; a unified tracking specification ("Luopan") was introduced in early 2018 to provide a solid data foundation.

3.3 Data Processing Layer: Builds a wide‑table (topic table) to flatten data, solves user identity unification across devices (IMEI, IDFA, app‑generated IDs, UCID), and generates three types of tags – basic/behavioral, preference scores, and predictive labels via classification and clustering algorithms.

3.4 Real‑time Profiling: Uses Spark Streaming to consume behavior data, stores it in HBase wide tables, updates counts atomically, and caches real‑time preferences in Redis, achieving second‑level profiling.

3.5 Application Data Storage Layer: Utilizes ClickHouse, MongoDB, and HBase for different workloads; Spark jobs import Hive data into ClickHouse, use bitmap operations for fast audience estimation, and sync data to MongoDB for push services and to HBase/Redis for low‑latency personalized services.

3.6 Application Layer: Provides functionalities such as audience definition, audience insight, tag management, tag marketplace, look‑alike audience expansion, and push messaging, all powered by the underlying data platform.

4. Effects: The platform processes billions of daily events, delivers data within 10 am, and supports over 400 million daily API calls with average response times of 5 ms, enabling personalized services across DSP advertising, recommendation, search, homepage layout, and more.

5. Summary: After two years of iteration, Beike's DMP has become a core platform supporting a wide range of scenarios, delivering personalized services and refined operations.

6. Outlook: Future work includes deepening tag coverage with predictive models, continuously improving effectiveness, and further platformization of the DMP.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data engineeringBig DataReal-time StreamingData Platformuser profilingBeikeDMP
Beike Product & Technology
Written by

Beike Product & Technology

As Beike's official product and technology account, we are committed to building a platform for sharing Beike's product and technology insights, targeting internet/O2O developers and product professionals. We share high-quality original articles, tech salon events, and recruitment information weekly. Welcome to follow us.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.