Databases 22 min read

How ByteDance Scales Ad Targeting with ClickHouse: Architecture & Optimizations

This article explains how ByteDance leverages ClickHouse for ad audience estimation, profiling, and analytics, detailing the challenges of massive user‑level set operations, the evolution from a simple tag‑uid table to Bitmap64 with RoaringBitmap, and the extensive engineering optimizations that cut query latency, storage, and CPU usage dramatically.

Volcano Engine Developer Services
Volcano Engine Developer Services
Volcano Engine Developer Services
How ByteDance Scales Ad Targeting with ClickHouse: Architecture & Optimizations

Business Background

Advertising is a core revenue source for many internet companies, and ByteDance uses ClickHouse as the backbone for large‑scale online analysis in its DMP/CDP and other ad services. The main use cases are audience estimation, audience profiling, and statistical analysis, all requiring sub‑second response times.

Challenges

Massive audience data (billions of users) and a huge number of tags.

Complex set operations (intersection, union, complement) that can involve hundreds of tag groups.

Strict query latency requirements (typically <5 seconds).

Technical Solution V1

The first version stores a two‑column table (tag_id, uid) with a primary key on tag_id. Set operations are expressed with IN (intersection) and OR (union). To improve performance, two optimizations were applied:

Maximize parallelism by pushing computation down to nodes and reducing data shuffling.

Accelerate COUNT(DISTINCT) via hash function tweaks and approximate algorithms (e.g., UniqHLL12).

A&(B|C)
SELECT count distinct(uid)
FROM tag_uid_map
WHERE tag_id = A
AND uid IN (
SELECT distinct uid
FROM tag_uid_map
WHERE (tag_id = B) OR (tag_id = C)
)

Data were sharded by uid (e.g., odd/even) across multiple machines, allowing each node to compute partial results independently and then aggregate the counts.

Technical Solution V2

The second version replaces the detailed table with a Bitmap64 column that stores a RoaringBitmap for each tag. This reduces storage, speeds up set operations, and simplifies SQL (no sub‑queries needed). The architecture introduces a new read‑and‑process pipeline that partitions data by uid, assigns each partition to a dedicated stream, and processes streams in parallel.

Key improvements include:

Space savings with RoaringBitmap (≈1/3 of original storage).

Faster computation thanks to native bitmap intersect/union operations.

More intuitive SQL without nested queries.

Additional engineering optimizations were applied:

Parallel execution at thread granularity, custom input streams, and a ParallelBitMapProcessor thread pool.

Block size reduction (8192 → 128 rows) to limit read amplification for bitmap columns.

Periodic merges and secondary indexes for precise data location.

Three‑layer caching: read‑level cache, intermediate‑result cache via a multiCount UDF, and final‑result cache.

Results

After deploying V2, storage dropped to one‑third, import time reduced by ~66 %, and query latency improved dramatically: most queries now finish well under 5 seconds, with average, p99, and max latencies all decreasing sharply. CPU usage fell noticeably, and PageCache savings exceeded 100 GB.

Future Work

Further compute‑ and data‑layer optimizations to reduce read amplification and exploit new hardware.

Smarter caching that automatically extracts common sub‑expressions, potentially using machine‑learning techniques.

Extending expression capabilities to support multi‑dimensional tags and richer UDFs.

Summary

The article details ByteDance’s use of ClickHouse for ad audience estimation, profiling, and analytics, describing the evolution from a simple tag‑uid schema to a Bitmap64‑based solution with extensive performance, storage, and caching optimizations that achieved substantial latency and resource reductions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationClickHouseBitmap IndexAd Targeting
Volcano Engine Developer Services
Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.