How to Count Website Visits with Redis: Hash, Bitset, and HyperLogLog
This article explains three Redis‑based techniques—using hash tables, bitsets, and the HyperLogLog probabilistic algorithm—to accurately count daily page views, detailing the required commands, implementation steps, advantages, and limitations for high‑traffic sites.
Background
Pinduoduo, with billions of users, needs an efficient way to count page visits for each webpage. Redis, a fast in‑memory data store, offers several data structures that can be leveraged for this purpose.
Method 1: Using a Hash
Redis hashes map a composite key (e.g., URI+date) to fields representing individual users. When a user visits:
If the user is logged in, use their user ID as the field.
If not logged in, generate a random identifier for the field.
Store a constant value (e.g., 1) with the HSET command. To obtain the daily unique visitor count for a page, call HLEN on the hash key.
Pros: Simple to implement, fast queries, high accuracy.
Cons: Memory consumption grows with the number of keys; for sites with billions of page‑views the hash can become too large.
Method 2: Using a Bitset
A bitset stores one bit per possible user ID. For a 32‑bit integer, each bit can represent a distinct user, reducing memory usage by up to 32× compared with storing full IDs.
Redis provides the SETBIT command to set a bit for a user and GETBIT to query it. After processing all visits for a day, BITCOUNT returns the total number of set bits, i.e., the unique visitor count.
Pros: Very low memory footprint; suitable for massive traffic.
Cons: If user IDs are sparse, the bitset may still consume more memory than a hash; additional mapping is needed for anonymous users.
Method 3: Using a Probabilistic Algorithm (HyperLogLog)
When exact counts are not required, Redis’s built‑in HyperLogLog offers an approximate cardinality estimator with minimal memory usage (≈12 KB per key). The workflow is:
On each visit, execute PFADD with a unique identifier (e.g., user ID or random token).
To retrieve the estimated unique visitor count, run PFCOUNT.
The algorithm typically incurs about 0.81 % error, which is acceptable for large‑scale analytics.
Pros: Extremely small memory consumption; ideal for sites with hundreds of millions of users.
Cons: Individual user queries are not reliable; the count is an estimate, not an exact figure.
Conclusion
Redis provides three practical approaches for counting website visits: hash tables for exact counts with moderate memory, bitsets for highly memory‑efficient exact counts, and HyperLogLog for approximate counts with negligible memory overhead. The choice depends on traffic volume, accuracy requirements, and available resources.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
