Beike DMP Platform: Architecture, Implementation Challenges, and Business Impact
The article details Beike's Data Management Platform (DMP) built since May 2018, covering its overall architecture, data collection, processing, real-time profiling, storage solutions, application scenarios, achieved performance metrics, and future development directions.
1. Background: To better understand real user needs, provide differentiated services, and achieve refined user operations, Beike launched a DMP platform in May 2018 that collects diverse user data, tags interests, and enables personalized recommendation, search, content guidance, and precise advertising or push messaging.
2. Challenges: The platform needed to unify user identities, handle massive real‑time behavior data, and achieve second‑level audience estimation and minute‑level complex audience calculations.
3. Implementation:
3.1 Overall Architecture
3.2 Data Collection Layer: Collects online and offline user behavior; a unified tracking specification ("Luopan") was introduced in early 2018 to provide a solid data foundation.
3.3 Data Processing Layer: Builds a wide‑table (topic table) to flatten data, solves user identity unification across devices (IMEI, IDFA, app‑generated IDs, UCID), and generates three types of tags – basic/behavioral, preference scores, and predictive labels via classification and clustering algorithms.
3.4 Real‑time Profiling: Uses Spark Streaming to consume behavior data, stores it in HBase wide tables, updates counts atomically, and caches real‑time preferences in Redis, achieving second‑level profiling.
3.5 Application Data Storage Layer: Utilizes ClickHouse, MongoDB, and HBase for different workloads; Spark jobs import Hive data into ClickHouse, use bitmap operations for fast audience estimation, and sync data to MongoDB for push services and to HBase/Redis for low‑latency personalized services.
3.6 Application Layer: Provides functionalities such as audience definition, audience insight, tag management, tag marketplace, look‑alike audience expansion, and push messaging, all powered by the underlying data platform.
4. Effects: The platform processes billions of daily events, delivers data within 10 am, and supports over 400 million daily API calls with average response times of 5 ms, enabling personalized services across DSP advertising, recommendation, search, homepage layout, and more.
5. Summary: After two years of iteration, Beike's DMP has become a core platform supporting a wide range of scenarios, delivering personalized services and refined operations.
6. Outlook: Future work includes deepening tag coverage with predictive models, continuously improving effectiveness, and further platformization of the DMP.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Beike Product & Technology
As Beike's official product and technology account, we are committed to building a platform for sharing Beike's product and technology insights, targeting internet/O2O developers and product professionals. We share high-quality original articles, tech salon events, and recruitment information weekly. Welcome to follow us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
