Big Data 11 min read

Construction and Practice of a Site-wide User Behavior Data Warehouse at 58.com

This article systematically describes the challenges, design principles, modeling methods, layered architecture, implementation steps, and standards used in building a comprehensive user behavior data warehouse for 58.com, highlighting practical experiences and future improvement directions.

58 Tech
58 Tech
58 Tech
Construction and Practice of a Site-wide User Behavior Data Warehouse at 58.com

Background – With the rapid growth of 58.com’s business, increasing data analysis and application demands have created significant challenges for data warehouse construction, including high integration costs, scattered business knowledge, vague data quality definitions, and inconsistent development standards.

Big Data Modeling Overview – Data modeling is essential for building a subject‑oriented, integrated, time‑variant, non‑volatile data warehouse. The article outlines the benefits of modeling (performance, cost, efficiency, quality) and reviews typical methodologies such as ER model, dimensional model, Data Vault, and Anchor model.

Site-wide User Behavior Data Warehouse Practice – The implementation follows widely‑adopted dimensional modeling, adapted to 58’s specific business characteristics, to create a clear, layered architecture that supports both integration and analytical needs.

Construction Principles – The warehouse is built on principles of business‑driven data, analysis‑friendly design, minimal impact from business changes, high cohesion and loose coupling, a solid foundational data layer, and clear separation between integration (ODS), detailed (DWD), and analytical (APP) layers.

Layered Architecture – ODS layer serves as the preparation zone, providing raw data for the DWD layer; DWD layer offers detailed, long‑term data for downstream applications; APP layer delivers fast‑query, analysis‑oriented datasets using dimensional modeling.

Model Implementation Process – The process includes thorough business research, overall architecture design, detailed design, code development and testing, data diff checks (field‑level and metric‑level), data acceptance (coverage, linkage, logical validation), documentation, and production deployment with operational monitoring and SLA guarantees.

Warehouse Construction Standards – The article specifies table naming conventions, field naming and type rules, and coding guidelines for HQL, emphasizing low‑storage types, consistent naming, avoidance of Hive reserved words, and performance‑oriented join and partition strategies.

Summary and Outlook – The initial construction has achieved a reasonable data architecture, reduced business impact, established a basic knowledge system, and set up preliminary data quality monitoring. Future work will focus on expanding data coverage, enhancing data completeness, and continuously improving data quality and stability.

References

1. "Big Data Journey – Alibaba’s Big Data Practice" 2. "Data Mining: Concepts and Techniques"

Author – Lu Yazhou, Senior Big Data Development Engineer, 58.com Commercial Product Technology Department.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datauser behaviorData Qualitydata modelingData WarehouseETL
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.