Databases 35 min read

Deep Dive into Redis: Data Structures, Persistence, and High Availability

This comprehensive guide explores Redis internals, covering its core data types and their low‑level implementations, persistence mechanisms like RDB and AOF, performance tricks, common pitfalls such as cache avalanche and split‑brain, distributed locking strategies, expiration and eviction policies, high‑availability architectures, and rate‑limiting techniques.

Java Backend Technology
Java Backend Technology
Java Backend Technology
Deep Dive into Redis: Data Structures, Persistence, and High Availability

1. Basic Types and Underlying Implementation

String

Purpose: simple key‑value storage, setnx for distributed lock, atomic counters, global unique IDs.

Suitable for simple key‑value storage, setnx key value for distributed lock, counters (atomic), distributed global unique ID.

Underlying: In C, String is a SDS (simple dynamic string) wrapping a char[]. An SDS can store up to 512 MB.

struct sdshdr{
  unsigned int len; // length of char[]
  unsigned int free; // unused bytes in the buffer
  char buf[]; // data storage
}

Redis wraps SDS into a RedisObject which has two roles: indicate the type (String) and hold a pointer to the SDS.

When executing SET name sowhat, Redis creates two RedisObject instances – one for the key and one for the value – both of type REDIS_STRING, with SDS storing "name" and "sowhat" respectively.

Optimizations:

When SDS size exceeds 1 MB, extra space is pre‑allocated. SDS uses lazy free: freed space is kept for future reuse.

List

Implementation can be found in adlist.h; it is a doubly linked list with a maximum length of 2^32‑1.

lpush + lpop = stack (LIFO) lpush + rpop = queue (FIFO) lpush + ltrim = capped collection lpush + brpop = message queue

Lists are often used for simple message queues; for larger workloads dedicated MQs like RabbitMQ are preferred.

Hash

Hashes store related fields together (e.g., a shopping cart). In Redis a hash has a single key (always a string) and the value can be any of the five data types.

dictEntry

Actual data node containing key, value, and next pointer.

dictht

Array of dictEntry pointers, size, sizemask (size‑1), and used count.

dict

Contains dictType (functions for key/value handling), rehashidx (‑1 means no rehash), two hash tables (dictht), and iterators.

Set

Similar to Java's HashSet; implemented via t.set.c. No value, only keys.

int setTypeAdd(robj *subject, robj *value){
  long long llval;
  if(subject->encoding == REDIS_ENCODING_HT){
    if(dictAdd(subject->ptr,value,NULL) == DICT_OK){
      incrRefCount(value);
      return 1;
    }
    ...
  }
}

ZSet

Ordered sets use a skip‑list (multi‑level linked list) for fast range queries.

Redis Geo

Geo uses GeoHash to map 2‑D coordinates to strings for area partitioning and distance queries.

HyperLogLog

A probabilistic data structure for cardinality estimation, using Bernoulli process, buckets, and harmonic mean. It can count up to 2^64 distinct elements with ~0.81% error using only 12 KB.

Bitmap

Bitmaps map a single bit to an element state, limiting memory usage. In Redis they are implemented on top of strings; the offset range is up to 2^32‑1.

2. Persistence

RDB

Periodically snapshots the dataset to a compressed binary file. Advantages: fast recovery, suitable for cold backups. Disadvantages: data loss possible if crash occurs during snapshot; forked child process doubles memory usage during save.

Commands: SAVE blocks the main thread; BGSAVE forks a child and continues serving clients.

AOF

Appends every write command to a log file. Advantages: better durability (fsync every second). Disadvantages: larger file size, slower recovery for large datasets.

Rewrite process creates a compact AOF by replaying the current dataset into commands.

Both RDB and AOF first write to a temporary file and then rename it atomically.

Recovery

On startup Redis checks for an AOF file first (more complete); if absent it loads the RDB snapshot.

Recommendation

Use RDB for fast recovery after a crash, then AOF to fill missing data – the “best of both worlds” approach.

3. Why Redis Is So Fast

In‑Memory

All data resides in memory, making operations hundreds of times faster than disk‑based storage.

Efficient Data Structures

Various specialized encodings (raw, ziplist, intset, skiplist) adapt to size and element count, minimizing memory and CPU usage.

Rich Encodings

String: raw or integer encoding. List: ziplist for short lists, otherwise linkedlist. Hash: ziplist for small fields, otherwise hashtable. Set: intset for small integer sets, otherwise hashtable. ZSet: ziplist for small sorted sets, otherwise skiplist.

Thread Model

Uses I/O multiplexing; a single thread handles all client connections, avoiding context switches.

Multithreading (Redis 6.0+)

Network I/O can be offloaded to multiple threads, improving throughput while command execution remains single‑threaded.

4. Common Problems

Cache Avalanche

Massive key expiration at the same time overwhelms the database.

Add random jitter to TTL. Distribute hot keys across multiple cache nodes. Set critical keys to never expire.

Cache Penetration

Requests for non‑existent keys bypass the cache and hit the DB.

Validate request parameters. Rate‑limit per IP. Cache null results with short TTL. Use Bloom filter.

Cache Breakdown

Hot key expires, causing a surge of DB traffic.

Solution: keep hot keys permanent and use mutex locks.

Split‑Brain

Network partition causes multiple masters; Redis can mitigate with min-replicas-to-write and min-replicas-max-lag settings.

Transactions

Redis transactions are sequential and exclusive but lack isolation levels and atomicity guarantees across multiple commands.

Correct Development Steps

Before launch: high‑availability (master‑slave + Sentinel or Cluster). During launch: local cache + Hystrix. After launch: persistence with RDB + AOF.

5. Distributed Locks

Zookeeper

Uses persistent, ephemeral, and sequential nodes; watches the predecessor node to avoid herd effect.

Redis

Simple lock with SETNX (or SET key value NX EX ttl) and unlock via Lua script to ensure atomic check‑and‑delete.

Redisson

Java client that wraps Redis primitives and provides a high‑level distributed lock implementation.

6. Expiration and Eviction Strategies

Expiration

Redis combines lazy deletion (key checked on access) with periodic scanning.

Eviction Policies

volatile‑lru volatile‑ttl volatile‑random allkeys‑lru allkeys‑random no‑eviction

7. High Availability

Master‑Slave Replication

Full sync on initial connection (RDB snapshot + command buffer) followed by incremental command replication.

Sentinel

Monitors masters and slaves, notifies administrators, performs automatic failover, and updates client configuration.

Redis Cluster

Uses 16384 virtual slots (CRC16) to shard data across masters; each master can have replicas; gossip protocol maintains cluster state.

8. Rate Limiting

SetNX / ZSet

Fixed window with SETNX + TTL; sliding window with a sorted set where timestamps are scores.

Leaky Bucket

Requests fill a bucket; tokens leak at a constant rate; overflow requests are rejected.

Token Bucket

Tokens are added at a fixed rate; each request consumes a token; burst traffic is allowed up to the bucket capacity.

9. Common Knowledge Points

Prefer SCAN over KEYS to avoid blocking.

Use pipelines to batch multiple commands.

Tool bigkeys scans for large keys.

Enable slow‑log for performance debugging.

Memory fragmentation can be reduced with activedefrag yes or by restarting.

10. End

For more deep dives into Redis internals, stay tuned.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilityredisPersistencerate limiting
Java Backend Technology
Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.