Mastering Redis for Massive Data Statistics: Bitmap, HyperLogLog, Sets, and Sorted Sets
This article explains how to choose the right Redis data structures—Bitmap, HyperLogLog, Set, List, and Sorted Set—to efficiently handle massive statistical scenarios such as UV counting, ranking, and aggregation in high‑traffic mobile applications.
In mobile application business scenarios we often need to store a key that maps to a collection of data and perform statistical sorting on that collection.
Typical use cases include determining a user’s login status, counting daily sign‑ins for millions of users, tracking new and retained users, measuring unique visitors (UV), displaying the latest comment list, and generating music play‑count rankings.
Because the number of users and visits can reach millions or even billions, we must select collection types that can efficiently handle massive data volumes.
Four statistical types are commonly used:
Binary state statistics
Aggregate statistics
Sorted statistics
Cardinality statistics
Beyond the basic Redis types (String, Set, Zset, List, Hash) we employ extended data structures Bitmap and HyperLogLog to implement these statistics.
Cardinality Statistics
Cardinality statistics count the number of distinct elements in a collection, commonly used for unique user (UV) calculations.
The most direct way is to use a Set, which adds an element only if it has never appeared before. However, for extremely large UV counts a Set can waste a lot of memory and precise counting is often unnecessary.
Redis provides the HyperLogLog structure for approximate cardinality estimation with a standard error of 0.81%, sufficient for UV counting.
Website UV via Set
When a user visits a page, we add the user ID to a Set: SADD RedisWhySoFast:uv 89757 We then retrieve the UV with:
SCARD RedisWhySoFast:uvWebsite UV via Hash
We can also use a Hash, storing the user ID as the field and setting the value to 1. Repeated visits simply overwrite the value.
Counting UV is done with:
HSET redisCluster:uv userId:89757 1 HLEN redisCluster:uvHyperLogLog as the King Solution
Using HyperLogLog, each user ID is added with PFADD and the approximate UV is obtained with PFCOUNT. Multiple HyperLogLog structures can be merged with PFMERGE to combine statistics from different pages.
PFADD RedisSyncPrinciple:uv userID1 userID2 userID3 PFCOUNT RedisSyncPrinciple:uv PFMERGE mergedUV RedisData MySQLData PFCOUNT mergedUV // returns 4Syntax for merging:
PFMERGE destkey sourcekey [sourcekey ...]Sorted Statistics
Redis provides four collection types; List and Sorted Set are ordered. Lists preserve insertion order, suitable for message queues, latest lists, and simple rankings. Sorted Sets order elements by a numeric score, ideal for leaderboards based on play count, likes, etc.
Latest Comment List
Using a List, we push new comments to the head with LPUSH and retrieve a range with LRANGE . This works well when pagination is not required or updates are infrequent.
LPUSH commentList 1 2 3 4 5 6 LRANGE commentList 0 4When new comments are inserted between page requests, the List’s positional ordering can cause duplicate or missing items on subsequent pages.
Therefore, Lists are suitable only for low‑frequency updates or when only the first few items are needed.
Leaderboard with Sorted Set
We store music IDs in a Sorted Set, using the play count as the score. Each play increments the score with ZINCRBY. Top‑N songs are retrieved with ZREVRANGE or ZRANGEBYSCORE.
ZADD musicTop 100000000 青花瓷 8999999 花田错 ZINCRBY musicTop 1 青花瓷 ZREVRANGE musicTop 0 0 WITHSCORESAggregation Statistics
Set operations allow us to compute intersections (common friends), differences (daily new users), and unions (total new users).
Intersection – Common Friends
SINTERSTORE user:commonFriends user:codeGeek user:guruDifference – Daily New Users
SDIFFSTORE user:new user:20210602 user:20210601Union – Total New Users
SUNIONSTORE userid:new user:20210602 user:20210601These operations can be computationally heavy on large datasets, potentially blocking the Redis instance. To avoid this, dedicate a separate cluster for aggregation or perform the calculations on the client side after fetching the raw data.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
