User Profiling Algorithms: From Ontology‑Based Methods to Deep Learning and Large Model Integration
This article provides a comprehensive overview of user profiling algorithms, covering the evolution from ontology‑based traditional methods to modern deep‑learning approaches, including structured label prediction, representation learning, active learning, and large‑model integration, while discussing challenges, practical applications, and future research directions.
User profiling (user portrait) is a machine‑readable, structured description of users that supports personalization, strategic decision‑making, and business analysis. Profiles can be divided into social‑general (static and dynamic) and domain‑specific categories, each further classified by time granularity and conceptual dimensions such as behavior, interest, and intent models.
Traditional ontology‑based profiling constructs an Ontology of entities, attributes, relations, and axioms (e.g., RDF/OWL). Early methods relied on TF‑IDF to weight tags, but suffered from sensitivity to tag granularity and lack of temporal dynamics. An improved weight‑update scheme initializes leaf‑node weights to zero and updates them using a behavior function that assigns implicit feedback scores (e.g., click < add‑to‑cart < order). Parent‑node weights are propagated with a decay factor to reflect hierarchical influence.
When tags are missing or users are cold‑started, matrix factorization or K‑nearest‑neighbor collaborative filtering can complete the label matrix, often with non‑negative constraints. Alternative machine‑learning approaches (e.g., KNN classification) also serve for interest inference.
Deep learning brings stronger user representation (metric learning), end‑to‑end modeling, multimodal data handling, and cost‑effective data acquisition. A representative model, C‑HMCNN, flattens hierarchical labels and predicts them with a loss that enforces hierarchical consistency (leaf predictions constrain parent predictions).
Lookalike techniques rely on robust representation learning, which can be obtained via supervised multi‑class classification, auto‑encoders, or graph neural networks (GNN/GCN). GNNs can be built from explicit user‑item graphs or inferred bipartite graphs (e.g., co‑purchase), enabling scalable approximate nearest‑neighbor search with graph‑based indexes such as HNSW, NSG, or SSG.
Active learning reduces labeling cost by training a Bayesian network with dropout retained at inference time to estimate prediction uncertainty. Samples with high uncertainty are sent for manual annotation, while confident predictions are accepted automatically, forming an iterative loop.
Large‑model world knowledge can augment profiling by generating detailed annotations from user interaction histories or product titles. However, outputs are unstructured and require downstream entity/relationship extraction and alignment with existing Ontologies.
The article concludes with open challenges: improving accuracy through unified identity resolution, handling shared‑account scenarios, real‑time intent prediction across scenes, transitioning from closed to open Ontologies, enhancing interpretability of deep‑learning‑based profiles, and better integration of large models into profiling pipelines.
A Q&A section addresses practical concerns such as AB‑testing strategies for profiles, dropout usage in Bayesian networks, privacy‑preserving large‑model deployment, and additional large‑model use cases beyond labeling.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.