Data Governance and Application for Behavior Analysis: Modeling Methods, Architecture, and Practical Cases
This article explains how a data‑ecosystem team governs and applies behavior‑analysis data by describing common analysis scenarios, data‑warehouse modeling methods and their pros and cons, the concepts and overall architecture of behavior‑centric analytics, key system components, and several concrete analysis examples such as retention, funnel and path analysis.
The article introduces the sharing session by the data‑ecosystem team of Qifu Technology, focusing on how to govern and apply data for behavior analysis.
1. Common Data Analysis Scenarios – Business data (post‑action results) and behavior data (user actions) are distinguished, with business data used for metric‑driven decisions and behavior data used to identify and optimize weak user journey steps.
2. Data‑Warehouse Modeling Methods – Describes the typical pipeline: user space logs → ODS → DWD → topic tables, and how these tables support reporting, feature mining, machine learning, and identity systems.
3. Advantages of Data‑Warehouse Modeling – Mature methodology, proven technology stack, strong vendor support, abundant talent, low adoption resistance, and suitability for multi‑dimensional metric analysis.
4. Disadvantages of Data‑Warehouse Modeling – Long construction chain, data consistency challenges, complex schema extensions, engineering heterogeneity, poor fit for time‑series behavior data, and inflexible pre‑aggregation.
5. Concepts for Behavior‑Centric Analysis – Introduces user space, user event sequences, event abstraction, and user‑group computation as new abstractions to handle behavior data.
6. Overall Architecture for Behavior Analysis – Presents a layered architecture: columnar storage with custom optimizations (Bloom filter, delta encoding), file and column metadata, ID‑mapping (OneID), query cache, result aggregation, and query access layer, all designed to minimize data loading and communication.
7. Key System Components – Details column storage design, metadata management for time‑based file indexing, OneID for cross‑device identity, caching layer with versioned keys, and the user data access layer that orchestrates metadata lookup, data loading, parallel computation, and aggregation.
8. Practical Analysis Examples – Demonstrates retention analysis (calculating daily new users and active users), funnel analysis (strict and non‑strict sequences for actions like play, favorite, purchase, download), and path analysis (using event in‑degree and out‑degree with depth‑first traversal).
9. Review – Summarizes the differences between behavior and metric analysis, the pros and cons of data‑warehouse modeling, the new concepts for behavior analysis, the multi‑layer system architecture, the technologies used at each layer, and the three main analysis scenarios supported.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.