Databases 9 min read

Mastering Redis HyperLogLog: Efficient Cardinality Estimation for Big Data

This article explains Redis HyperLogLog, its underlying principles, memory efficiency, typical use cases like UV/PV counting, and provides practical command examples (PFADD, PFCOUNT, PFMERGE) to perform high‑performance cardinality estimation on massive datasets.

Architecture & Thinking
Architecture & Thinking
Architecture & Thinking
Mastering Redis HyperLogLog: Efficient Cardinality Estimation for Big Data

1 Introduction

Redis offers various data structures, including less common types such as BitMap, Geo, and HyperLogLog, each solving specific statistical problems. This section focuses on HyperLogLog, which is designed for cardinality estimation (e.g., counting unique IPs, users, page views) while using minimal memory.

2 About HyperLogLog

HyperLogLog is a specialized Redis data structure for approximate distinct‑count calculations. Traditional counting stores each element, causing memory to grow linearly with data size. HyperLogLog uses a fixed‑size algorithm (12 KB per key) that can represent up to 2⁶⁴ distinct values with a standard deviation of about 0.81%.

Typical scenarios include daily unique visitor (UV) counting, page view (PV) aggregation, and merging statistics across multiple keys.

2.1 Practical Use Cases

Count daily page views (PV) where each visit counts once.

Count daily unique visitors (UV) where multiple visits by the same user in a day count as one.

Merge multiple keys to obtain aggregate PV across different site sections.

2.2 Efficiency and Scale

Storing raw IP addresses for 10 million daily visitors would require roughly 143 MB (15 bytes per IP). In contrast, a HyperLogLog key occupies only 12 KB, regardless of the number of elements, because it stores a compact sketch of 16 384 buckets. The estimation error is calculated as 1.04 / √m, where m is the bucket count, yielding an error of 0.008125 (0.81%).

<code>10,000,000 * 15 / (1024 * 1024) = 143.05 M</code>
<code>1.04 / sqrt(16384) = 0.008125</code>

3 HyperLogLog Commands

Redis provides three commands to work with HyperLogLog:

3.1 PFADD – Add Elements

PFADD inserts one or more elements into a HyperLogLog structure.

<code>redis > PFADD key element [element ...]</code>

Example:

<code># Add IPs to a HyperLogLog for Baidu site
redis> PFADD baidu:ip_address "192.168.0.1" "192.168.0.2" "192.168.0.3"
(integer) 1

# Adding an existing IP does not change the count
redis> PFADD baidu:ip_address "192.168.0.3"
(integer) 0  # IP already exists</code>

3.2 PFCOUNT – Estimate Cardinality

PFCOUNT returns the approximate unique element count for one or more HyperLogLog keys.

<code>redis > PFCOUNT key [key ...]</code>

Example:

<code>redis> PFCOUNT baidu:ip_address
(integer) 1034546</code>

3.3 PFMERGE – Merge Sketches

PFMERGE combines multiple HyperLogLog structures into a single one, deduplicating overlapping elements.

<code>redis > PFMERGE destkey sourcekey [sourcekey ...]</code>

Example merging Baidu and Taobao IP counts:

<code># Add IPs for Baidu
redis> PFADD baidu:ip_address "192.168.0.1" "192.168.0.2" "192.168.0.3"

# Add IPs for Taobao
redis> PFADD taobao:ip_address "192.168.0.3" "192.168.0.4" "192.168.0.5"

# Merge and deduplicate
redis> PFMERGE total:ip_address baidu:ip_address taobao:ip_address
OK

# Resulting count is 5 unique IPs
redis> PFCOUNT total:ip_address
(integer) 5</code>

4 Conclusion

HyperLogLog provides an efficient, memory‑light solution for cardinality estimation, ideal for scenarios such as daily PV/UV counting, IP statistics, and aggregating metrics across multiple keys. By leveraging PFADD, PFCOUNT, and PFMERGE, developers can handle massive datasets with sub‑percent error while keeping Redis memory usage minimal.

Big DataDatabaseHyperLogLogRedisCardinalityPFADDPFCOUNT
Architecture & Thinking
Written by

Architecture & Thinking

🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.