Big Data 17 min read

User Portrait Scenarios and Technical Implementation Solutions

This article presents a comprehensive overview of user‑portrait applications across industries such as internet TOC, e‑commerce, security and finance, explains product functions like tag metadata management, single‑user profiling, crowd selection and SOP, and details the underlying big‑data pipeline, ETL scheduling, data‑warehouse layers and technology stack required to support these use cases.

DataFunTalk
DataFunTalk
DataFunTalk
User Portrait Scenarios and Technical Implementation Solutions

The session, hosted by data‑architecture expert Zhao Hongtian, introduces the concept of user portraits and outlines three major sections: common application scenarios, product functionalities, and technical implementation solutions.

Common Application Scenarios – Different industries collect distinct data sources, leading to varied portrait needs. Internet TOC covers registration info, strong authentication, and in‑app behavior for content recommendation and SOP marketing. E‑commerce uses similar data for personalized VIP services and messaging. Security leverages identity, travel and facial‑recognition data for real‑time risk alerts. Finance aggregates registration, behavior and device data to build 360° risk profiles and evaluate credit risk.

Specific WeChat‑Based Scenarios – Channel QR codes embed unique identifiers that automatically tag users by acquisition source. Tagged users trigger personalized welcome messages. SOP (Standard Operating Procedure) pushes deliver timed, segmented content based on tags, supporting both analysis‑oriented queries and high‑concurrency API calls.

Portrait Product Functions – The platform provides tag metadata management (hierarchical tag catalogs), single‑user portrait lookup, crowd selection via tag combinations, crowd analysis with multi‑dimensional reports, behavior analysis (retention, funnel, distribution), and tag‑driven SOP automation for personalized messaging.

Technical Implementation – The data flow starts with log and attribute collection from business databases and crawlers, stored in an ODS layer. Tags are computed in the DWS layer (offline batch, T+1, statistical tags dominate; occasional algorithmic tags). Results are served to OLAP engines, Redis/HBase for low‑latency lookups, ClickHouse for multi‑dimensional analysis, and exposed via APIs for both internal tools and high‑traffic client services.

ETL scheduling orchestrates tag generation, validation, crowd computation and wide‑table creation, feeding downstream services such as Redis, ClickHouse and Elasticsearch. The technology stack combines big‑data components (Hive, Spark, HBase, ClickHouse, Elasticsearch) with application services (Java/Scala micro‑services, REST APIs) and addresses challenges like query latency for real‑time portrait retrieval and high‑concurrency API handling.

Q&A Highlights – Data security is enforced through permission‑based API controls. Tag‑portrait middle‑platform integration with existing recommendation systems requires clear tag hierarchy and governance. Offline tags cover the majority of use cases; real‑time tags are used for immediate push scenarios.

The presentation concludes with thanks and a reminder to like, share, and follow for future content.

Big Datadata warehouseUser ProfilingtaggingSCRMmarketing automation
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.