Databases 13 min read

How ClickHouse Powers Real-Time User Profiling for Millions of Users

This article explains how Suning built a massive user‑tag platform using ClickHouse and RoaringBitmap to enable second‑level user profiling for over 600 million users, covering tag generation, storage, query architecture, and performance optimizations for marketing campaigns.

Suning Technology
Suning Technology
Suning Technology
How ClickHouse Powers Real-Time User Profiling for Millions of Users

Business Scenario

To run a marketing campaign such as a Double 11 promotion, Suning needs to select a target audience of about ten million users from a pool of over 600 million, then analyze their profiles to ensure they match the desired characteristics.

Tag Selection and Estimation

Users are filtered by combining attribute tags (e.g., gender, age, membership level) and behavior tags (e.g., browsing, adding to cart, purchasing). The platform estimates the size of the resulting audience in real time.

Example tag logic: users aged 25‑36 AND with "smart home" characteristics, EXCLUDING those who spent less than 10 CNY in the past 30 days.

{ {用户年龄 25-36} ∩ {智能家居人群} } - {30天消费小于10元}

Technical Challenges

Large number of audience packages.

Each package contains a massive user base.

Real‑time computation of results is difficult.

Previous Solution (ElasticSearch)

The original system stored tags in a wide table in ElasticSearch, requiring batch insertion and later association before querying, which caused delays when upstream data pipelines lagged.

Why ClickHouse?

ClickHouse is an open‑source columnar database optimized for OLAP queries, offering superior query performance. Suning integrated RoaringBitmap into ClickHouse to provide efficient bitmap operations for tag storage and analysis.

Bitmap Integration

Each tag is stored as a bitmap of user IDs, enabling fast set operations (union, intersection, difference) while reducing storage space.

Bitmap Integration Diagram
Bitmap Integration Diagram

New Architecture Based on ClickHouse

The architecture consists of a ClickHouse Manager for cluster and metadata management, Spark jobs that generate and import tag data from HDFS into ClickHouse, and a Proxy layer that translates user queries into ClickHouse SQL.

Tag tables use a row‑to‑column design where each tag name becomes a partition and the uv column stores a bitmap of user IDs.

CREATE TABLE ch_label_string (
  labelname String,   -- tag name
  labelvalue String,  -- tag value
  uv AggregateFunction(groupBitmap, UInt64)  -- user ID bitmap
) ENGINE = AggregatingMergeTree()
PARTITION BY labelname
ORDER BY (labelname, labelvalue)
SETTINGS index_granularity = 128;

Data is sharded by user ID, allowing parallel import at ~1.5 million rows/second per node. Queries run on a single shard, and results are summed across shards, exploiting ClickHouse’s distributed parallelism.

Query Flow

Using a WITH clause, the system builds bitmap objects for the selected audience, then applies bitmap functions for set operations. Distributed proxy tables route queries to the appropriate shards without storing data themselves.

Query Flow Diagram
Query Flow Diagram
-- Example query
CREATE TABLE ch_agent_user (agentname String) ENGINE = MergeTree() PARTITION BY agentname ORDER BY (agentname) SETTINGS index_granularity = 8192;
CREATE TABLE ch_agent_dist_user AS ch_agent_user ENGINE = Distributed('cluster_test','test','ch_agent_user',cityHash64(agentname));
SELECT sum(user_number) AS user_number
FROM ch_agent_dist_user
RIGHT JOIN (
  WITH (
    SELECT groupBitmapState(userid) AS users0 FROM ch_label_string WHERE labelname = 'T'
  ) AS users0
  SELECT 'agent' AS agentname, bitmapCardinality(users0) AS user_number
) USING (agentname) SETTINGS enable_scalar_subquery_optimization = 0;

User Profiling Evolution

Three architecture versions were implemented:

V1 – Wide Table: A single table with many feature columns; suffered from schema changes and slow group‑by on large audiences.

V2 – Bitmap: Each feature value stored as a bitmap; improved performance but still slow for very large audiences due to disk deserialization.

V3 – Join Table Engine: Tags stored in ClickHouse’s Join engine, keeping bitmaps in memory; reduced average query time to 1‑2 seconds for tens of millions of users.

Summary

Integrating RoaringBitmap into ClickHouse and leveraging the Join table engine dramatically improved the tag platform’s data analysis capabilities, achieving sub‑2‑second query latency, parallel tag generation, near‑real‑time updates, and reduced hardware costs compared to the previous ElasticSearch solution.

Future Plans

Upgrade to 64‑bit RoaringBitmap.

Introduce a smarter caching layer to boost query concurrency.

Support offline generation of ClickHouse data files for faster tag readiness.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLClickHouseuser profilingBitmapTag Platform
Suning Technology
Written by

Suning Technology

Official Suning Technology account. Explains cutting-edge retail technology and shares Suning's tech practices.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.