Databases 18 min read

User Retention, Funnel, and Session Analysis in ClickHouse Using Bitmap and Retention Functions

The article explains how to perform efficient user retention, funnel, and session analysis on large ClickHouse datasets by replacing costly multi‑table joins with bitmap compression, the built‑in retention function, windowFunnel, and high‑order array functions, providing practical SQL examples and performance insights.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
User Retention, Funnel, and Session Analysis in ClickHouse Using Bitmap and Retention Functions

Retention analysis measures how many users who performed an event on a given day repeat another event on subsequent days, e.g., 20% of users who clicked a banner on 20200701 clicked the app sign‑in on 20200702.

Traditional multi‑table joins in ClickHouse are inefficient for large‑scale retention queries because they require costly joins on massive logs.

Two optimized solutions are presented: using ClickHouse’s built‑in retention aggregate function, and a bitmap‑based approach with RoaringBitmap that compresses user‑event data and speeds up queries.

The bitmap method stores user IDs in bitmaps per event, dramatically reducing storage from billions of rows to tens of thousands of rows, as illustrated by the example tables table_oper_bit and table_attribute_bit .

Retention queries can then be written with retention(cond1, cond2, …) or with bitmap functions such as bitmapCardinality and bitmapAnd to compute user overlap.

For funnel analysis, ClickHouse provides the windowFunnel function, which evaluates ordered event sequences within a time window; example queries show how to calculate conversion rates for paths like “browse → click → order → pay”.

Advanced path analysis can be built with high‑order array functions (e.g., arrayJoin , arrayCompact , arraySort , arrayFilter ) to reconstruct user journeys, identify key steps, and count distinct users per path.

Session analysis is also covered, defining a session as a series of actions within a configurable idle timeout (e.g., 30 minutes) and providing SQL examples that split event streams into sessions and compute metrics such as session count and average depth.

Performance tests on a 12‑core, 125 GB ClickHouse cluster show sub‑second query times for retention over millions of users, while noting that real‑time analysis and complex joins remain challenging.

SQLdata analysisClickHouseBitMapsessionretention
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.