User Portrait Algorithms: From Ontology‑Based Methods to Deep Learning and Future Directions
This article provides a comprehensive overview of user portrait algorithms, covering their historical development, ontology‑based traditional approaches, deep‑learning enhancements, representation‑learning techniques such as lookalike, active‑learning driven iteration, and the integration of large‑model world knowledge, while also discussing current challenges and future research directions.
Introduction – The talk focuses on user portrait algorithms, aiming to present a macro view of their development history, current status, and future prospects.
1. User Portrait Overview – A user portrait is a machine‑readable, human‑understandable structured description of a user, useful for personalization, strategic decisions, and business analysis. Portraits can be classified by data source (social‑general vs. domain‑specific) and by temporal dimension (static, dynamic, life‑stage). Domain‑specific portraits further split into semi‑static and dynamic types, encompassing behavior, interest, and intent models.
2. Ontology‑Based Traditional Portraits – Early non‑deep‑learning portraits relied on knowledge graphs derived from ontologies, which define entities, attributes, relations, and axioms. Ontologies are encoded in formats like RDF/OWL and built by domain experts. Simple ontology examples illustrate hierarchical tagging (e.g., movie → genre → sub‑genre). Traditional weighting used TF‑IDF, which is simple but insensitive to ontology structure and temporal dynamics. An improved method updates leaf‑node weights based on user behavior, propagates to parent nodes with a decay factor, and incorporates time‑decay windows to capture short‑ and long‑term interests.
3. Deep Learning Enhancements – Deep learning brings stronger user representation (metric learning), end‑to‑end modeling, multimodal data handling, and cost‑effective data acquisition. Structured‑label prediction can be performed with C‑HMCNN, which flattens hierarchical labels and applies a loss that enforces hierarchical consistency. Lookalike modeling benefits from three representation paradigms: multi‑class supervised learning, auto‑encoders, and graph‑based methods (GNN/GCN) that can be built from implicit user‑item graphs.
4. Active Learning for Portrait Iteration – To reduce labeling cost, an active‑learning loop uses Bayesian networks with dropout retained at inference time to estimate prediction uncertainty. Samples with high uncertainty are sent for manual annotation, while confident predictions are accepted automatically, iteratively improving the portrait model.
5. Leveraging Large‑Model World Knowledge – Large language models can annotate or predict portraits by prompting them with user interaction histories or product titles, producing detailed, albeit unstructured, analyses. Post‑processing (entity extraction, relation mapping) aligns these outputs with existing ontologies, enriching portrait data without extensive manual effort.
6. Summary & Outlook – Key challenges include unifying virtual IDs across devices, identifying the primary user in shared accounts, real‑time intent prediction across scenarios, transitioning from closed to open ontologies, improving accuracy, interpretability, and integrating large models effectively.
Q&A – The speaker addresses practical concerns such as AB‑testing strategies for portraits, dropout usage in Bayesian networks, privacy‑preserving large‑model deployment, and additional large‑model applications beyond labeling.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.