Big Data 7 min read

Evolution of Dazhong Dianping’s Data Platform (2012‑2014): Key Lessons for Growing Big Data Teams

This article chronicles the step‑by‑step evolution of Dazhong Dianping’s data platform from 2012 to 2014, detailing changes in data models, storage and compute architecture, scheduling, monitoring, and data‑driven applications, offering practical insights for teams building early‑stage big‑data infrastructures.

21CTO
21CTO
21CTO
Evolution of Dazhong Dianping’s Data Platform (2012‑2014): Key Lessons for Growing Big Data Teams

1.0 (2012.07)

Data: Primarily supported user reporting needs, established basic underlying models, and used Python for model computation.

Architecture: Storage and compute on Greenplum with dual‑cluster hot standby; data transferred from MySQL/SQLServer via DBA‑generated files, parsed nightly and loaded into Greenplum; scheduling via Quartz with dependency checks stored in tables; monitoring relied on user‑program alerts via email and phone.

Data Applications: Report data emailed to users; users could query data through a custom SQL web tool.

2.0 (2013.04)

Data: Introduced clear model layering: ODS (raw source data), DW (cleaned, transformed historical data), DM (data marts for departmental or thematic analysis), RPT (user‑facing reports). Built three core models (traffic, group‑buy, information) and derived data marts; developed the Canaan computation framework and custom UDFs.

Architecture: Storage and compute on Hive; Greenplum used as a cache for small‑scale fast queries and report storage. Scheduling integrated with the Canaan framework, enabling rapid task addition and automatic dependency import. Central metadata repository for warehouse metadata; ACL system for data access permissions. Implemented an offline heterogeneous data transfer tool (Wormhole) inspired by Alibaba DataX, with a visual UI and integration with scheduling and metadata systems.

Monitoring: With task count exceeding 2000, visual monitoring dashboards were built.

Data Applications: Operational tools with user‑defined SQL stored in Hive; KPI calculations via Hive with results stored in Greenplum for reporting; a convenient Hive Web UI surpassing the native Hive Web Interface.

3.0 (2013.12)

Data: Established higher‑level data marts, linking layers such as traffic and group‑buy; created user‑centric and merchant‑centric thematic marts; collaborated with the algorithm team to build a recommendation system; provided frameworks and tools for external data developers.

Architecture: Added MySQL and HBase to support online services; exposed data access via API, query engine, and RPC services; introduced Shark for ad‑hoc queries, isolating Shark/Spark clusters from Hadoop/Hive for stability; implemented data quality checks based on user‑specified conditions.

Data Products: Supported dashboards.

4.0 (2014.12)

Data: Continuously expanded and refined data models; standardized data sources such as app logs and channels; enhanced the data development platform, enabling over 100 external department developers.

Architecture: Built a Redis Cluster for real‑time recommendation and user profiling; upgraded Hadoop to YARN; introduced Storm for real‑time computation; launched a Kafka‑like distributed messaging system integrated with a logging framework for low‑cost, fast log ingestion; established a metadata center.

Data Products: Released proprietary data products including operational effectiveness evaluation and traffic analysis tools.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data engineeringData PlatformData WarehouseETLBig Data Architecture
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.