Understanding Redis: Core Data Types, Persistence, Replication, and Common Pitfalls
This comprehensive guide explores Redis fundamentals, including NoSQL concepts, its five primary data structures, internal storage mechanisms, persistence options, replication, sentinel failover, and strategies to prevent cache penetration, breakdown, and avalanche, providing developers with deep insights for building robust, high‑performance applications.
What is NoSQL?
NoSQL ("Not Only SQL") refers to a family of non‑relational databases that store data without a fixed schema and can scale horizontally. Unlike traditional relational databases that keep data in rows and columns with a uniform structure, NoSQL systems such as Redis, MongoDB, HBase and Neo4j support flexible data models (key‑value, document, column‑family, graph) and are better suited for the massive, high‑concurrency workloads of modern web applications.
What is Redis?
Redis (Remote Dictionary Server) is an open‑source, in‑memory key‑value store written in ANSI C. It provides network access, optional persistence to disk, and a rich set of native data structures (strings, hashes, lists, sets, sorted sets, bitmaps, HyperLogLog, geospatial indexes). Like Memcached it caches data in RAM for speed, but it also supports replication, transactions, Lua scripting and high‑availability features (Sentinel, Cluster).
Redis basic data types
String
Strings are the simplest type and can store up to 512 MB. They may contain plain text, JSON, binary blobs or numeric strings.
Typical uses
Cache layer for fast reads.
Real‑time counters (e.g., INCR).
Session storage for shared user sessions.
List
Lists are ordered collections of strings. Commands LPUSH / RPUSH insert at the head/tail, while LPOP / RPOP remove elements. They are ideal for implementing queues and stacks.
Typical uses
Message queues (e.g., LPUSH + BRPOP).
Paginated feeds where order matters.
Set
Sets store unique, unordered strings. Internally Redis chooses between an intset (compact integer array) for small integer‑only collections and a hash table for larger or mixed data.
Typical uses
Tag management and grouping.
Finding mutual friends (set intersection).
Counting distinct IP addresses.
Sorted Set (Zset)
Sorted sets combine uniqueness with a floating‑point score that determines ordering. They are implemented with a ziplist for small collections and a skiplist for larger ones, providing O(log N) range queries.
Typical uses
Leaderboards (e.g., video view counts).
Priority queues where higher scores mean higher priority.
Hash
Hashes map field names to values, making them perfect for representing objects or rows.
Typical uses
Storing object attributes (e.g., user profiles).
Reducing round‑trips by keeping related fields together.
Underlying storage structures
String storage – SDS
Redis uses Simple Dynamic Strings (SDS) instead of C strings. SDS stores the length of the string, allowing O(1) length queries, automatic buffer expansion and binary‑safe operations.
struct sdshdr8 {
uint8_t len; // used length
uint8_t alloc; // allocated space
unsigned char flags;
char buf[]; // actual bytes
};When the string fits within 44 bytes Redis uses the embstr encoding (SDS header + data in a single allocation). Larger strings switch to the raw encoding, which may trigger a reallocation on modification.
List storage – ziplist and quicklist
Before Redis 3.2 a list was stored either as a ziplist (compact, contiguous memory) or as a doubly linked list. A ziplist is efficient for small lists (elements < 64 bytes and total count < 512) but incurs O(N) insertion cost because the whole buffer must be reallocated.
Since Redis 3.2 the default implementation is a quicklist, a linked list of ziplist nodes. This hybrid approach keeps the low‑memory footprint of ziplist while providing constant‑time insertions of a linked list.
typedef struct quicklist {
struct quicklistNode *head, *tail;
unsigned long count;
int level;
} quicklist;
typedef struct quicklistNode {
struct quicklistNode *prev, *next;
unsigned char *zl; // pointer to ziplist data (or compressed LZF)
unsigned int sz; // ziplist size in bytes
unsigned int count : 16; // number of entries in this ziplist
unsigned int encoding : 2; // RAW or LZF
unsigned int container : 2; // NONE or ZIPLIST
unsigned int recompress : 1;
unsigned int attempted_compress : 1;
unsigned int extra : 10;
struct quicklistLZF *compressed;
} quicklistNode;Set storage – intset and hash table
If a set contains only integers and fewer than 512 elements Redis uses an intset (sorted integer array). Otherwise it falls back to a regular hash table.
typedef struct intset {
uint32_t encoding; // INT16, INT32, or INT64
uint32_t length; // number of elements
int8_t contents[]; // sorted integer values
} intset;Sorted set storage – ziplist and skiplist
Small sorted sets are stored as a ziplist where each element occupies two consecutive nodes (member and score). Larger sets use a skiplist, providing O(log N) range queries.
typedef struct zskiplistNode {
robj *obj; // member object
double score; // sorting score
struct zskiplistNode *backward;
struct zskiplistLevel {
struct zskiplistNode *forward;
unsigned int span;
} level[];
} zskiplistNode;Hash storage – hash table
Hashes are implemented with a dictionary (hash table) that may have two tables during rehashing.
typedef struct dictEntry {
void *key;
union {
void *val;
uint64_t u64;
int64_t s64;
} v;
struct dictEntry *next;
} dictEntry;
typedef struct dict {
dictType *type;
void *privdata;
dictht ht[2];
long rehashidx; // -1 when not rehashing
int iterators;
} dict;Three special Redis data types
Geospatial (sorted set with Geohash)
Geospatial data (longitude, latitude, name) are stored in a sorted set where the score is a 52‑bit Geohash integer. This enables efficient radius queries via GEORADIUS and GEORADIUSBYMEMBER.
HyperLogLog (cardinality estimation)
HyperLogLog provides approximate distinct‑element counting using only 12 KB of memory, capable of estimating cardinalities up to 2⁶⁴ with a small error margin.
Bitmaps
Bitmaps are strings interpreted at the bit level, allowing fast set/clear operations on individual bits. They are useful for tracking binary states such as user activity, login status, or attendance.
Redis transactions
A transaction groups multiple commands between MULTI and EXEC. All commands are queued and executed atomically after EXEC. Syntax errors abort the whole transaction; runtime errors do not roll back already‑executed commands.
Redis does not provide automatic rollback because most errors are programming mistakes that should be caught during development, keeping the engine simple and fast.
Persistence mechanisms
RDB (snapshot)
RDB creates a compressed binary snapshot of the dataset at configured intervals. The main process forks a child that writes the snapshot to a temporary file; once complete, the temporary file replaces the old dump.rdb.
Trigger methods SAVE – synchronous, blocks clients. BGSAVE – asynchronous, uses fork.
Automatic triggers based on save configuration (e.g., save 900 1 means snapshot if at least one key changes within 900 seconds).
Replication full sync and FLUSHALL also generate snapshots.
Advantages: compact files, fast restart, low memory overhead. Drawbacks: possible data loss between snapshots and fork overhead for large datasets.
AOF (Append‑Only File)
AOF logs every write command in the Redis protocol format. On restart the log is replayed to reconstruct the dataset.
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec # options: always, everysec, noAOF can be rewritten (compact) using BGREWRITEAOF, which forks a child to rewrite the log into a minimal set of commands.
Advantages: higher durability (configurable). Drawbacks: larger files and slower recovery.
RDB vs AOF comparison
Startup priority: RDB – low, AOF – high.
File size: RDB – small, AOF – large.
Recovery speed: RDB – fast, AOF – slow.
Data safety: RDB – possible loss, AOF – configurable (always/everysec).
Publish/Subscribe
Redis implements a message‑passing model where publishers send messages to channels and subscribers receive them. Subscriptions are stored in a dictionary mapping channel names to linked lists of client connections.
struct redisServer {
dict *pubsub_channels; // channel → list of clients
list *pubsub_patterns; // pattern subscriptions
};Pattern subscriptions use wildcards ( *, ?) and are stored in a list of pubsubPattern structures linking a client to a pattern.
Master‑Slave replication
Replication copies data from a master to one or more slaves. It consists of a full sync (initial snapshot) followed by incremental sync (commands received while the snapshot is being transferred).
Full sync steps
Slave sends PSYNC ? -1 to request synchronization.
Master replies with FULLRESYNC <runid> <offset> and starts a BGSAVE to generate an RDB file.
Master streams the RDB file to the slave.
After the RDB is loaded, the master sends any buffered write commands that arrived during the snapshot.
After the full sync, the master forwards new write commands to all slaves (incremental sync).
Sentinel mechanism
Sentinel provides high‑availability by monitoring masters and slaves, performing automatic failover, and acting as a configuration service for clients.
Key components
Monitoring: periodic INFO and PING checks.
Subjective/Objective down detection: a node is considered down after down-after-milliseconds timeout reported by a majority (quorum) of Sentinels.
Leader election: Sentinels use a Raft‑like voting process ( SENTINEL IS-MASTER-DOWN-BY-ADDR) to elect a leader.
Failover: the leader selects the best slave (based on offset, priority, run‑id) and promotes it with SLAVEOF NO ONE, then re‑configures the remaining slaves.
Cache problems and mitigations
Cache penetration
Occurs when requests query keys that do not exist in both cache and database, causing repeated DB hits.
Mitigation strategies:
Input validation (e.g., reject negative IDs).
Cache negative results with a short TTL (e.g., 30 seconds).
Use a Bloom filter to pre‑filter nonexistent keys.
Cache breakdown (cache stampede)
Happens when a hot key expires and many concurrent requests miss the cache, overwhelming the DB.
Solutions:
Never expire hot keys.
Apply rate limiting and circuit breaking.
Use a mutex (e.g., SETNX) so only one request rebuilds the cache.
Cache avalanche
When many keys expire simultaneously, the DB experiences a massive surge of traffic.
Preventive measures:
Randomize TTLs to avoid synchronized expiration.
Distribute hot data across multiple cache nodes.
Keep critical data permanently cached.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
