Why HyperLogLog Misses 100M Daily Active Users and How Bitmap Solves It
The article dissects an Alibaba interview question on counting 100 million daily active users, showing why HyperLogLog’s error and lack of per‑user state make it unsuitable, and presents a detailed Bitmap‑based architecture—including sharding, pre‑computation, and ClickHouse integration—to achieve precise, high‑performance analytics.
Problem
An interview question describes a system with 5 billion registered users and 100 million daily active users (DAU). The system must provide a real‑time large‑screen dashboard and also determine whether a specific user is logged in.
Why HyperLogLog is unsuitable
HyperLogLog estimates cardinality with a fixed relative error of 0.81 %. For 100 million DAU the error margin is ±810 000, which exceeds the precision required for an exact dashboard.
Redis HyperLogLog provides only PFADD, PFCOUNT and PFMERGE. There is no command to query the presence of an individual element, so per‑user login status cannot be determined.
Bitmap‑based precise solution
Redis strings can be used as bitmaps where each bit represents a user’s login state (1 = logged in, 0 = offline). The user ID is used as the bit offset. SETBIT key offset 1 – marks a user as logged in. GETBIT key offset – returns 1 if the user is logged in, 0 otherwise (O(1) time). BITCOUNT key – counts all bits set to 1, yielding an exact DAU count with no error.
For a continuous ID space of 5 billion users the bitmap occupies roughly 60 MB (plus ~5 MB metadata), which is acceptable for modern Redis clusters.
Sharding to avoid a big‑key hotspot
shard_index = user_id % 1024
key = bitmap:dau:{date}:{shard_index}
offset = user_id / 1024 // integer divisionDistributes load across 1024 keys, each stored on a different cluster node.
Each shard consumes ~63 KB, keeping total memory around 65 MB.
Failure of a single shard does not affect the whole metric.
Aggregations can be performed by summing BITCOUNT across all shards.
Industrial‑grade multi‑layer architecture
Core layer (bitmap sharding) : Handles per‑user login flags and exact DAU counting.
Screen layer (pre‑computation + cache) : A scheduled job (e.g., every 5 seconds via XXL‑Job) runs BITCOUNT on all shards, sums the results, and stores the total in dau:{date}:total with a short TTL (≈10 s). The dashboard reads this key directly, achieving sub‑millisecond latency.
Fallback : If the pre‑computation job lags, the system can temporarily display the HyperLogLog estimate (0.81 % error) with a clear disclaimer.
Multi‑dimensional analytics with ClickHouse
Login events (including dimensions such as region, device, birth year) are published to Kafka.
Flink consumes the stream, cleans it, and writes enriched records to ClickHouse.
ClickHouse’s native Bitmap type and functions ( bitmapAnd, bitmapCount) enable fast multi‑dimensional aggregation (e.g., DAU for users in Hangzhou on iOS born in the 1990s).
Sparse ID (Snowflake) pitfalls and solutions
Using raw Snowflake IDs as bitmap offsets would require a bitmap size proportional to the maximum 64‑bit ID (~9 × 10¹⁸), resulting in ~116 TB memory and extreme sparsity.
ID mapping + bitmap (preferred) : Maintain a mapping table (MySQL or Redis) from Snowflake ID to a continuous integer. Login flow: lookup the mapping → use the continuous integer as bitmap offset. Memory for 100 million users ≈12 MB.
Sharded hash (fallback) : Store login flags in Redis hashes. Use HSET to record login, HEXISTS to query status, and HLEN on each shard to obtain the total. Memory ≈3–4 GB for 100 million users. Suitable when a mapping table cannot be introduced.
Final architecture summary
Bitmap sharding provides exact per‑user state and error‑free DAU counting.
Pre‑computed aggregates cached for sub‑millisecond dashboard reads.
ClickHouse handles flexible, high‑performance multi‑dimensional analytics.
HyperLogLog is retained only as a temporary fallback when the pre‑computation pipeline stalls.
Tech Freedom Circle
Crazy Maker Circle (Tech Freedom Architecture Circle): a community of tech enthusiasts, experts, and high‑performance fans. Many top‑level masters, architects, and hobbyists have achieved tech freedom; another wave of go‑getters are hustling hard toward tech freedom.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
