Using Redis Data Structures for Efficient Large‑Scale Statistics: Cardinality, Sorting, and Aggregation
The article explains how to choose appropriate Redis data structures—such as Bitmap, HyperLogLog, Set, List, Hash, and Sorted Set—to efficiently handle massive statistical scenarios like UV counting, ranking, and set‑based aggregation, while providing concrete command examples and performance considerations.
In mobile‑app business scenarios we often need to associate a key with a large collection of data and perform statistical sorting on that collection. Typical use cases include checking a user’s login status, counting 7‑day consecutive sign‑ins for hundreds of millions of users, daily new‑user and retention statistics, UV (Unique Visitor) counting, latest comment lists, and music‑play ranking.
Because the number of users and visits can reach millions or even billions, we must select collection types that can efficiently handle such scale.
Four statistical types are introduced: binary state statistics, aggregate statistics, sorted statistics, and cardinality statistics.
Cardinality Statistics
Cardinality statistics count the number of distinct elements in a collection, commonly used for UV calculation.
The naive approach uses a Set , which adds an element only when it has never appeared before. However, for massive traffic a plain Set consumes excessive memory and may not need exact precision.
Redis provides the HyperLogLog data structure, an approximate distinct‑count algorithm with a standard error of 0.81% and a fixed memory footprint (≈12 KB) regardless of the number of elements.
Typical commands:
PFADD mypage:uv userID1 userID2 userID3 PFCOUNT mypage:uvMultiple HyperLogLog structures can be merged with PFMERGE to obtain a combined cardinality.
PFMERGE mergedKey hll1 hll2Website UV via Set
Using a Set, each user ID is added once per day:
SADD RedisWhyFast:uv 89757The UV is obtained with SCARD :
SCARD RedisWhyFast:uvWebsite UV via Hash
Alternatively, store the user ID as a hash field and set its value to 1 on each visit.
HSET redisCluster:uv userId:89757 1UV is then the hash length:
HLEN redisCluster:uvHyperLogLog as the Preferred Solution
When the number of unique visitors reaches tens of millions, a Set or Hash would consume prohibitive memory, while HyperLogLog keeps memory usage constant.
Sorted Statistics
Redis offers four collection types: List, Set, Hash, and Sorted Set. List and Sorted Set preserve order.
List : ordered by insertion order, suitable for message queues, latest‑item lists, simple leaderboards.
Sorted Set : ordered by a numeric score , ideal for leaderboards based on play count, likes, etc.
Latest Comment List (List)
Use LPUSH to insert new comments at the head and LRANGE to fetch a range.
LPUSH commentList 1 2 3 4 5 6 LRANGE commentList 0 4Lists are unsuitable for high‑frequency updates with pagination because inserted elements shift existing indices, causing duplicate or missing items on subsequent pages.
Leaderboard (Sorted Set)
Store music IDs in a Sorted Set where the score is the play count. Increment the score with ZINCRBY , retrieve top N with ZREVRANGE or ZRANGEBYSCORE .
ZADD musicTop 100000000 青花瓷 8999999 花田错 ZINCRBY musicTop 1 青花瓷 ZREVRANGE musicTop 0 9 WITHSCORESAggregate Statistics
Aggregate statistics involve set operations such as intersection, difference, and union.
Intersection – Common Friends
SINTERSTORE commonFriends user:alice user:bobDifference – Daily New Users
SDIFFSTORE newUsers user:20210602 user:20210601Union – Total New Users Over Two Days
SUNIONSTORE totalNew user:20210602 user:20210601Because set operations can be costly on large datasets, it is recommended to offload aggregation to a dedicated Redis cluster or perform the computation on the client side to avoid blocking the primary service.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.