Understanding Redis Sorted‑Set Implementation: From Singly Linked List to Skiplist
This article explains how Redis implements its Sorted‑Set using a skiplist, starting with the basics of a singly linked list, analyzing insertion and query complexities, deriving the skiplist structure and its O(log N) performance, and comparing it with hash tables, balanced trees, and B‑trees.
Background – The article begins with an ordered singly linked list, demonstrates insertion and search operations, and shows their O(N) time complexity, motivating the need for a more efficient structure.
1. Singly Linked List
A singly linked list node is defined as:
typedef struct node {
int data; // data field
struct LinkNode *next; // pointer to next node
} LinkNode;Insertion of values 10, 20, 30 creates a list, and inserting 25 is illustrated step‑by‑step, confirming the O(N) insertion cost.
Search for a value (e.g., 25) follows a similar linear scan, also O(N).
2. Skiplist Formation
To reduce complexity, a skiplist adds multi‑level index nodes on top of an ordered list. By promoting every second node to a higher level, the search path shortens from 6 comparisons to 4, and with additional levels can approach O(log N) time.
The skiplist height is determined probabilistically; each node’s level is generated by:
int zslRandomLevel(void) {
int level = 1;
while ((random() & 0xFFFF) < (ZSKIPLIST_P * 0xFFFF))
level += 1;
return (level < ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}Average height with p = 1/4 is about 1.33.
3. Redis Sorted‑Set Underlying Structures
Redis stores a Sorted‑Set as a combination of a hash table and a skiplist. The skiplist node is defined as:
typedef struct zskiplistNode {
sds ele; // element value
double score; // sorting score
struct zskiplistNode *backward; // backward pointer
struct zskiplistLevel {
struct zskiplistNode *forward; // forward pointer
unsigned long span; // span to next node
} level[];
} zskiplistNode;The skiplist itself holds header, tail, length, and current level:
typedef struct zskiplist {
struct zskiplistNode *header, *tail;
unsigned long length;
int level;
} zskiplist;Creation of a new skiplist initializes these fields:
zskiplist *zslCreate(void) {
int j;
zskiplist *zsl = zmalloc(sizeof(*zsl));
zsl->level = 1;
zsl->length = 0;
zsl->header = zslCreateNode(ZSKIPLIST_MAXLEVEL, 0, NULL);
for (j = 0; j < ZSKIPLIST_MAXLEVEL; j++) {
zsl->header->level[j].forward = NULL;
zsl->header->level[j].span = 0;
}
zsl->header->backward = NULL;
zsl->tail = NULL;
return zsl;
}4. Skiplist Time Complexity
Search, insertion, and deletion all run in O(log N) expected time because the number of levels is proportional to log N and each level reduces the remaining distance by a constant factor.
5. Insertion Process
Insertion consists of four steps: locate the position, generate a random level, adjust pointers/spans, and update backward links. Core code:
zskiplistNode *zslInsert(zskiplist *zsl, double score, sds ele) {
zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
unsigned int rank[ZSKIPLIST_MAXLEVEL];
int i, level;
serverAssert(!isnan(score));
x = zsl->header;
for (i = zsl->level-1; i >= 0; i--) {
rank[i] = (i == zsl->level-1) ? 0 : rank[i+1];
while (x->level[i].forward &&
(x->level[i].forward->score < score ||
(x->level[i].forward->score == score &&
sdscmp(x->level[i].forward->ele, ele) < 0))) {
rank[i] += x->level[i].span;
x = x->level[i].forward;
}
update[i] = x;
}
level = zslRandomLevel();
if (level > zsl->level) {
for (i = zsl->level; i < level; i++) {
rank[i] = 0;
update[i] = zsl->header;
update[i]->level[i].span = zsl->length;
}
zsl->level = level;
}
x = zslCreateNode(level, score, ele);
for (i = 0; i < level; i++) {
x->level[i].forward = update[i]->level[i].forward;
update[i]->level[i].forward = x;
x->level[i].span = update[i]->level[i].span - (rank[0] - rank[i]);
update[i]->level[i].span = (rank[0] - rank[i]) + 1;
}
for (i = level; i < zsl->level; i++)
update[i]->level[i].span++;
x->backward = (update[0] == zsl->header) ? NULL : update[0];
if (x->level[0].forward)
x->level[0].forward->backward = x;
else
zsl->tail = x;
zsl->length++;
return x;
}6. Query Process
Finding the rank of an element or retrieving a range uses the same forward‑span traversal logic. Example for getting an element by rank:
zskiplistNode *zslGetElementByRank(zskiplist *zsl, unsigned long rank) {
zskiplistNode *x = zsl->header;
unsigned long traversed = 0;
int i;
for (i = zsl->level-1; i >= 0; i--) {
while (x->level[i].forward && (traversed + x->level[i].span) <= rank) {
traversed += x->level[i].span;
x = x->level[i].forward;
}
if (traversed == rank) return x;
}
return NULL;
}Range queries (ZRANGE/ZREVRANGE) walk the list from the start or tail using the level‑0 forward/backward pointers, providing excellent cache locality.
7. Deletion Process
Deletion mirrors insertion: locate the node, update spans and forward pointers, fix backward links, and possibly shrink the skiplist level.
int zslDelete(zskiplist *zsl, double score, sds ele, zskiplistNode **node) {
zskiplistNode *update[ZSKIPLIST_MAXLEVEL], *x;
int i;
x = zsl->header;
for (i = zsl->level-1; i >= 0; i--) {
while (x->level[i].forward &&
(x->level[i].forward->score < score ||
(x->level[i].forward->score == score &&
sdscmp(x->level[i].forward->ele, ele) < 0))) {
x = x->level[i].forward;
}
update[i] = x;
}
x = x->level[0].forward;
if (x && score == x->score && sdscmp(x->ele, ele) == 0) {
zslDeleteNode(zsl, x, update);
if (!node) zslFreeNode(x);
else *node = x;
return 1;
}
return 0;
}
void zslDeleteNode(zskiplist *zsl, zskiplistNode *x, zskiplistNode **update) {
int i;
for (i = 0; i < zsl->level; i++) {
if (update[i]->level[i].forward == x) {
update[i]->level[i].span += x->level[i].span - 1;
update[i]->level[i].forward = x->level[i].forward;
} else {
update[i]->level[i].span -= 1;
}
}
if (x->level[0].forward)
x->level[0].forward->backward = x->backward;
else
zsl->tail = x->backward;
while (zsl->level > 1 && zsl->header->level[zsl->level-1].forward == NULL)
zsl->level--;
zsl->length--;
}8. Comparison with Other Data Structures
Skiplist vs. hash table: hash tables give O(1) look‑up but lack ordering and range queries. Skiplist vs. balanced trees (e.g., red‑black tree): both provide O(log N) operations, but skiplist is simpler to implement and offers better cache locality for sequential scans. Skiplist vs. B‑tree: B‑trees are optimized for disk‑based storage; skiplist uses less memory per node, making it a better fit for an in‑memory database like Redis.
Conclusion & References
The article walks from basic linked‑list concepts through skiplist theory to the concrete Redis source code, giving readers a thorough understanding of how Redis implements its Sorted‑Set with high performance. References include the book “Redis 5 Design and Source Code Analysis” and official Redis source files.
Xueersi Online School Tech Team
The Xueersi Online School Tech Team, dedicated to innovating and promoting internet education technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.