User Portrait Tagging: Construction, Feature Processing, and Evaluation
This article provides a comprehensive guide on building user portrait tags—from basic attribute tags to business and strategy tags—detailing data collection methods, feature engineering techniques such as cleaning, time decay, and smoothing, and evaluation metrics for cohesion and stability, aimed at data product managers and analysts.
The presentation begins with an overview of the growing need for deep user understanding in digital transformation, introducing the concept of user portrait tags and their importance across various business scenarios.
1. Portrait Tag Introduction covers basic attribute tags (e.g., gender, age, OS, city) built via user input, event tracking, model prediction, or third‑party data, and their applications in daily analysis and modeling.
2. Business‑Oriented Tags discusses tags strongly or weakly linked to KPI goals, such as high/low activity users, and how they are constructed based on KPI proximity or composite behavior calculations to support targeted operations and differentiated strategies.
3. Strategy‑Oriented Tags explains tags designed for specific interventions (e.g., red‑packet‑sensitive users, repeat‑purchase groups) and how uplift models, purchase‑cycle predictions, and binary classifiers are used to maximize ROI.
Feature Processing and Tag Evaluation
(1) Data Cleaning includes outlier detection (box‑plots, AVF), outlier filling (cap/floor percentiles), and missing‑value imputation based on metric definitions.
(2) Time Decay Processing applies RFM‑style weighting (Recency, Frequency, Monetary) to give recent behavior higher influence, with formulas illustrated in the original slides.
(3) Smoothing uses logarithmic transformation to reduce head‑tail effects and improve data distinguishability, as shown by before‑and‑after distribution images.
Tag Evaluation focuses on two key standards:
Cohesion: measured by Silhouette Coefficient, ensuring intra‑group similarity and inter‑group separation.
Stability: assessed via Coefficient of Variation, checking that segmentation criteria and results remain consistent over time.
Q&A section addresses practical questions about calculating cohesion for activity groups, defining activity thresholds, computational complexity of time decay, and the broader application of tag evaluation in long‑term label design.
The session concludes with thanks and references to related articles.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.