Backend Development 14 min read

Mastering Redis for Massive Data Statistics: Bitmap, HyperLogLog, Sets, and Sorted Sets

This article explains how to choose the right Redis data structures—Bitmap, HyperLogLog, Set, List, and Sorted Set—to efficiently handle massive statistical scenarios such as UV counting, ranking, and aggregation in high‑traffic mobile applications.

Su San Talks Tech

Jun 18, 2021

Mastering Redis for Massive Data Statistics: Bitmap, HyperLogLog, Sets, and Sorted Sets

In mobile application business scenarios we often need to store a key that maps to a collection of data and perform statistical sorting on that collection.

Typical use cases include determining a user’s login status, counting daily sign‑ins for millions of users, tracking new and retained users, measuring unique visitors (UV), displaying the latest comment list, and generating music play‑count rankings.

Because the number of users and visits can reach millions or even billions, we must select collection types that can efficiently handle massive data volumes.

Four statistical types are commonly used:

Binary state statistics

Aggregate statistics

Sorted statistics

Cardinality statistics

Beyond the basic Redis types (String, Set, Zset, List, Hash) we employ extended data structures Bitmap and HyperLogLog to implement these statistics.

Cardinality Statistics

Cardinality statistics count the number of distinct elements in a collection, commonly used for unique user (UV) calculations.

The most direct way is to use a Set, which adds an element only if it has never appeared before. However, for extremely large UV counts a Set can waste a lot of memory and precise counting is often unnecessary.

Redis provides the HyperLogLog structure for approximate cardinality estimation with a standard error of 0.81%, sufficient for UV counting.

Website UV via Set

When a user visits a page, we add the user ID to a Set: SADD RedisWhySoFast:uv 89757 We then retrieve the UV with:

SCARD RedisWhySoFast:uv

Website UV via Hash

We can also use a Hash, storing the user ID as the field and setting the value to 1. Repeated visits simply overwrite the value.

Counting UV is done with:

HSET redisCluster:uv userId:89757 1

HLEN redisCluster:uv

HyperLogLog as the King Solution

Using HyperLogLog, each user ID is added with PFADD and the approximate UV is obtained with PFCOUNT. Multiple HyperLogLog structures can be merged with PFMERGE to combine statistics from different pages.

PFADD RedisSyncPrinciple:uv userID1 userID2 userID3

PFCOUNT RedisSyncPrinciple:uv

PFMERGE mergedUV RedisData MySQLData

PFCOUNT mergedUV  // returns 4

Syntax for merging:

PFMERGE destkey sourcekey [sourcekey ...]

Sorted Statistics

Redis provides four collection types; List and Sorted Set are ordered. Lists preserve insertion order, suitable for message queues, latest lists, and simple rankings. Sorted Sets order elements by a numeric score, ideal for leaderboards based on play count, likes, etc.

Latest Comment List

Using a List, we push new comments to the head with LPUSH and retrieve a range with LRANGE . This works well when pagination is not required or updates are infrequent.

LPUSH commentList 1 2 3 4 5 6

LRANGE commentList 0 4

When new comments are inserted between page requests, the List’s positional ordering can cause duplicate or missing items on subsequent pages.

Therefore, Lists are suitable only for low‑frequency updates or when only the first few items are needed.

Leaderboard with Sorted Set

We store music IDs in a Sorted Set, using the play count as the score. Each play increments the score with ZINCRBY. Top‑N songs are retrieved with ZREVRANGE or ZRANGEBYSCORE.

ZADD musicTop 100000000 青花瓷 8999999 花田错

ZINCRBY musicTop 1 青花瓷

ZREVRANGE musicTop 0 0 WITHSCORES

Aggregation Statistics

Set operations allow us to compute intersections (common friends), differences (daily new users), and unions (total new users).

Intersection – Common Friends

SINTERSTORE user:commonFriends user:codeGeek user:guru

Difference – Daily New Users

SDIFFSTORE user:new user:20210602 user:20210601

Union – Total New Users

SUNIONSTORE userid:new user:20210602 user:20210601

These operations can be computationally heavy on large datasets, potentially blocking the Redis instance. To avoid this, dedicate a separate cluster for aggregation or perform the calculations on the client side after fetching the raw data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

HyperLogLog bitmap Sorted Set backend-development Data Statistics Set

Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.