Big Data 13 min read

Design and Implementation of 58.com Commercial DMP Platform

This talk presents the architecture, feature extraction, storage, real-time computation, monitoring, and optimization strategies of 58.com’s commercial DMP platform, detailing business requirements, system design across data, storage, compute, and service layers, and future plans for unified services and advanced analytics.

DataFunTalk
DataFunTalk
DataFunTalk
Design and Implementation of 58.com Commercial DMP Platform

The presentation introduces the 58.com commercial DMP (Data Management Platform), a unified system that integrates scattered multi‑source data, standardizes and segments it, and provides feature data for online advertising and other business applications.

Business Requirements : The product technology team needs to combine user click features, context features, and internal ad‑library features to support online ad recommendation, ranking, decoration, as well as other commercial marketing and merchant platforms.

Feature Requirements : Fast, convenient definition of feature extraction logic; fusion of historical (seconds‑level) and real‑time features within a limited time window; rapid rollout; and support for experimental iteration.

Platform Architecture :

Data layer: unified ingestion from Kafka, ESB, HDFS, APIs, etc., with cleaning and transformation.

Storage layer: KV stores (Redis, self‑developed wtable) for high‑throughput reads/writes.

Compute layer: Spark, Flink, Storm, Spark‑Streaming engines; operator SDKs hide heterogeneous computation.

Service layer: ID‑Mapping, routing, experiment, and process modules for data isolation, traffic distribution, A/B testing, and business decoupling.

Monitoring layer: task, service, storage health monitoring with custom alerts.

Platform Functions include behavior ingestion, feature storage, feature extraction, and feature service, providing rich, ordered feature data and metadata management for both online and offline use.

Metadata Management distinguishes C‑end (traffic‑related) and B‑end (advertiser‑related) tags.

Feature Extraction Process covers both offline batch extraction (multi‑day aggregation) and real‑time extraction (Importer + Operator SDK), supporting plug‑in deployment.

Computation Framework consists of NODE (abstracts heterogeneous data/computation, offers Spark, Hive, Flink), Module (topology parsing), and Operator (exposes behavior2Feature, mergeFeature, feature2Attribute APIs).

Real‑time Computing Challenges such as stability under traffic spikes, Flink framework issues, and data transmission delays were addressed with back‑pressure mechanisms, blacklist/failure handling, custom monitoring, and Flume optimizations.

Optimization Solutions include parallel service calls, ID‑Mapping caching, compression headers, lazy‑load adjustments, read‑write merge optimizations, bulk‑load for offline features, and sharding improvements, reducing timeout probability from ~3% to <1% and cutting offline import time from 3 hours to 0.5 hours.

DMP Experiment Platform follows a Google‑style hierarchical experiment framework, handling traffic routing, block conditions, and sink labeling for downstream analysis.

Applications :

Online recommendation feed for second‑hand cars, requiring second‑level feature processing.

Audience services for marketing, supporting tag‑based crowd creation and query, with Redis caching for low‑latency lookups.

Future Plans aim to build a unified OneService offering user portrait, tag management, open APIs, and audience management, and to integrate Doris for more real‑time, multidimensional analysis.

The speaker concludes the session and encourages sharing, liking, and following the DataFunTalk community.

Big DataReal-time ProcessingFeature Engineeringdata-platformscalable architectureDMP
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.