Databases 14 min read

How ClickHouse Enables Millisecond‑Scale User Profiling for Hundreds of Millions

This article explains how Suning built a high‑performance user‑tag platform on ClickHouse, replacing Elasticsearch with bitmap‑based storage and a new architecture that delivers sub‑second profiling queries for over 600 million users, detailing the design, implementation, and future enhancements.

dbaplus Community
dbaplus Community
dbaplus Community
How ClickHouse Enables Millisecond‑Scale User Profiling for Hundreds of Millions

Background

Marketing campaigns require precise audience selection, which relies on user profiling—creating a tag‑based model from user attributes, behaviors, and business scenarios.

Business Scenario

During a Double‑11 promotion, Suning needed to issue 10 million appliance coupons. The system first filtered users matching specific tags, estimated the audience size, and then generated a user‑ID list for automated marketing.

Technical Challenges

Large number of audience packages.

Each package contains a massive set of user IDs.

Real‑time computation of results across distributed nodes.

Previous Solution (Elasticsearch Wide Table)

Tags were stored in a wide table in Elasticsearch, with one row per user and many tag columns. Data insertion required waiting for all business data, and queries involved heavy aggregations, leading to latency and operational delays.

ClickHouse Integration

Suning replaced Elasticsearch with ClickHouse, integrating RoaringBitmap to store user IDs as compressed bitmap objects. This enables efficient set operations (union, intersection, difference) directly in the database.

Bitmap Table Schema

CREATE TABLE ch_label_string (
  labelname String,               -- tag name
  labelvalue String,              -- tag value
  uv AggregateFunction(groupBitmap, UInt64) -- bitmap of user IDs
) ENGINE = AggregatingMergeTree()
PARTITION BY labelname
ORDER BY (labelname, labelvalue)
SETTINGS index_granularity = 128;

The uv column stores a bitmap of user IDs, dramatically reducing storage and accelerating set calculations.

Data Partitioning and Sharding

Tags are partitioned by name, and data is sharded by user ID, allowing parallel import of up to 1.5 million rows per second per node. Queries run on a single shard, returning only the count of matching users, which the coordinator sums.

Query Flow

Using a WITH clause, the system builds bitmap objects for each tag, then applies set operations to compute the final audience. Distributed proxy tables route queries to the appropriate shards without moving data.

Example SQL

-- Distributed proxy table
CREATE TABLE ch_agent_user (
  agentname String
) ENGINE = MergeTree()
PARTITION BY agentname
ORDER BY (agentname)
SETTINGS index_granularity = 8192;

CREATE TABLE ch_agent_dist_user AS ch_agent_user
ENGINE = Distributed('cluster_test', 'test', 'ch_agent_user', cityHash64(agentname));

SELECT sum(user_number) AS user_number
FROM ch_agent_dist_user
RIGHT JOIN (
  WITH (
    SELECT groupBitmapState(userid) AS users0 FROM ch_label_string WHERE labelname = 'T'
  ) AS users0
  SELECT 'agent' AS agentname, bitmapCardinality(users0) AS user_number
) USING (agentname) SETTINGS enable_scalar_subquery_optimization = 0;

The query forces execution of the WITH subquery on every node (by disabling scalar subquery optimization) to ensure up‑to‑date bitmap calculations.

Architecture Evolution

V1 – Wide Table: Simple schema but required table alterations for new features and suffered from heavy GROUP BY on large audiences.

V2 – Bitmap Mode: Stored each tag’s user set as a bitmap; improved latency to ~5‑10 seconds but still limited by disk‑based bitmap deserialization.

V3 – Join Table Engine: Leveraged ClickHouse’s Join engine to keep bitmap tables in memory, reducing query time to 1‑2 seconds for tens of millions of users.

Benefits of the New Architecture

Parallel tag construction accelerates data readiness.

Concurrent HDFS‑to‑ClickHouse imports speed up availability.

Average query latency under 2 seconds; complex queries under 5 seconds.

Near‑real‑time tag updates.

User‑friendly tag expressions and SQL improve maintainability.

Hardware resource usage reduced by roughly 50 % compared to Elasticsearch.

Future Plans

Upgrade RoaringBitmap to a 64‑bit version.

Introduce a smarter caching layer to boost ClickHouse query concurrency.

Support offline generation of ClickHouse data files to further speed up tag readiness.

Conclusion

By integrating bitmap functionality and the Join table engine into ClickHouse, Suning’s tag platform achieved massive performance gains, enabling rapid, scalable user profiling for marketing and recommendation services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataClickHouseuser profilingBitmapOLAPTag Platform
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.