Databases 16 min read

Why HyperLogLog Misses 100M Daily Active Users and How Bitmap Solves It

The article dissects an Alibaba interview question on counting 100 million daily active users, showing why HyperLogLog’s error and lack of per‑user state make it unsuitable, and presents a detailed Bitmap‑based architecture—including sharding, pre‑computation, and ClickHouse integration—to achieve precise, high‑performance analytics.

Tech Freedom Circle
Tech Freedom Circle
Tech Freedom Circle
Why HyperLogLog Misses 100M Daily Active Users and How Bitmap Solves It

Problem

An interview question describes a system with 5 billion registered users and 100 million daily active users (DAU). The system must provide a real‑time large‑screen dashboard and also determine whether a specific user is logged in.

Why HyperLogLog is unsuitable

HyperLogLog estimates cardinality with a fixed relative error of 0.81 %. For 100 million DAU the error margin is ±810 000, which exceeds the precision required for an exact dashboard.

Redis HyperLogLog provides only PFADD, PFCOUNT and PFMERGE. There is no command to query the presence of an individual element, so per‑user login status cannot be determined.

Bitmap‑based precise solution

Redis strings can be used as bitmaps where each bit represents a user’s login state (1 = logged in, 0 = offline). The user ID is used as the bit offset. SETBIT key offset 1 – marks a user as logged in. GETBIT key offset – returns 1 if the user is logged in, 0 otherwise (O(1) time). BITCOUNT key – counts all bits set to 1, yielding an exact DAU count with no error.

For a continuous ID space of 5 billion users the bitmap occupies roughly 60 MB (plus ~5 MB metadata), which is acceptable for modern Redis clusters.

Sharding to avoid a big‑key hotspot

shard_index = user_id % 1024
key = bitmap:dau:{date}:{shard_index}
offset = user_id / 1024  // integer division

Distributes load across 1024 keys, each stored on a different cluster node.

Each shard consumes ~63 KB, keeping total memory around 65 MB.

Failure of a single shard does not affect the whole metric.

Aggregations can be performed by summing BITCOUNT across all shards.

Industrial‑grade multi‑layer architecture

Core layer (bitmap sharding) : Handles per‑user login flags and exact DAU counting.

Screen layer (pre‑computation + cache) : A scheduled job (e.g., every 5 seconds via XXL‑Job) runs BITCOUNT on all shards, sums the results, and stores the total in dau:{date}:total with a short TTL (≈10 s). The dashboard reads this key directly, achieving sub‑millisecond latency.

Fallback : If the pre‑computation job lags, the system can temporarily display the HyperLogLog estimate (0.81 % error) with a clear disclaimer.

Multi‑dimensional analytics with ClickHouse

Login events (including dimensions such as region, device, birth year) are published to Kafka.

Flink consumes the stream, cleans it, and writes enriched records to ClickHouse.

ClickHouse’s native Bitmap type and functions ( bitmapAnd, bitmapCount) enable fast multi‑dimensional aggregation (e.g., DAU for users in Hangzhou on iOS born in the 1990s).

Sparse ID (Snowflake) pitfalls and solutions

Using raw Snowflake IDs as bitmap offsets would require a bitmap size proportional to the maximum 64‑bit ID (~9 × 10¹⁸), resulting in ~116 TB memory and extreme sparsity.

ID mapping + bitmap (preferred) : Maintain a mapping table (MySQL or Redis) from Snowflake ID to a continuous integer. Login flow: lookup the mapping → use the continuous integer as bitmap offset. Memory for 100 million users ≈12 MB.

Sharded hash (fallback) : Store login flags in Redis hashes. Use HSET to record login, HEXISTS to query status, and HLEN on each shard to obtain the total. Memory ≈3–4 GB for 100 million users. Suitable when a mapping table cannot be introduced.

Final architecture summary

Bitmap sharding provides exact per‑user state and error‑free DAU counting.

Pre‑computed aggregates cached for sub‑millisecond dashboard reads.

ClickHouse handles flexible, high‑performance multi‑dimensional analytics.

HyperLogLog is retained only as a temporary fallback when the pre‑computation pipeline stalls.

ArchitectureHyperLogLogshardingRedisClickHouseBitmapDailyActiveUsers
Tech Freedom Circle
Written by

Tech Freedom Circle

Crazy Maker Circle (Tech Freedom Architecture Circle): a community of tech enthusiasts, experts, and high‑performance fans. Many top‑level masters, architects, and hobbyists have achieved tech freedom; another wave of go‑getters are hustling hard toward tech freedom.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.