Choosing Appropriate Redis Data Structures for Large‑Scale Statistics: Cardinality, Sorting, and Aggregation
This article explains how to select Redis data structures such as Bitmap, HyperLogLog, Set, List, Sorted Set, and Hash to efficiently handle massive statistical scenarios like user login status, UV counting, ranking, and set aggregation, while providing concrete command examples and best‑practice recommendations.
Cardinality Statistics
For counting unique elements (e.g., UV), using a plain Set becomes memory‑intensive at millions of users, so Redis offers the probabilistic HyperLogLog structure, which provides ~0.81% error with a fixed 12KB memory footprint regardless of cardinality.
Typical workflow: PFADD key userID1 userID2 ... to add IDs, then PFCOUNT key to retrieve the approximate unique count. Multiple HyperLogLog structures can be merged with PFMERGE dest source1 source2 ... , and the merged result reflects the union of the original sets.
Sorted Statistics
Redis provides ordered collections: List (insertion order) and Sorted Set (score‑based order). Lists are suitable for simple recent‑item feeds, while Sorted Sets are ideal for leaderboards where scores (e.g., play counts) change frequently.
Examples:
Insert a comment at the head of a list with LPUSH key comment and retrieve a range using LRANGE key start stop .
Maintain a music ranking by adding songs with ZADD musicTop score song , incrementing scores via ZINCRBY musicTop 1 song , and fetching the top N with ZREVRANGE musicTop 0 N‑1 WITHSCORES .
Aggregation Statistics
Redis Set operations enable intersection, union, and difference calculations, useful for scenarios such as finding common friends, daily new users, or total new users across days.
Examples:
Common friends: SINTERSTORE dest setA setB .
Daily new users: SDIFFSTORE newUsers day2Set day1Set .
Total new users over two days: SUNIONSTORE totalNew day1Set day2Set .
Because set aggregation can be costly on large datasets, it is recommended to offload these calculations to a dedicated Redis cluster or perform them on the client side after fetching the raw data.
Additional Data Types
The article also mentions using Bitmap for bit‑level statistics and HyperLogLog for approximate cardinality, extending beyond the five basic Redis types.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.