Databases 8 min read

Redis String Data Structure: Implementation, Encoding Formats, and Operations

This article explains Redis string basics, its mutable SDS implementation, common commands, internal memory layout, and the three encoding formats (int, embstr, raw) that determine how strings are stored and optimized in the database.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Redis String Data Structure: Implementation, Encoding Formats, and Operations

Introduction

Redis offers five fundamental data structures; the string type is the simplest and most widely used. Although simple on the surface, its internal design is highly refined.

Basic Overview

Unlike Java, Redis strings are mutable dynamic strings (Simple Dynamic String, SDS) whose internal structure resembles an ArrayList that maintains a byte array with pre‑allocated spare space to reduce frequent memory allocations. When the string length is less than 1 MB, each expansion doubles the existing space; for lengths exceeding 1 MB, each expansion adds 1 MB. The maximum string length is 512 MB.

Typical string operations include setting, getting, and batch commands that reduce network overhead:

> set name test<br/>OK<br/>> get name<br/>"test"<br/>> mset name1 test1 name2 test2<br/>OK<br/>> mget name1 name2<br/>1) "test1"<br/>2) "test2"<br/>> del name<br/>(integer) 1

Redis strings can also store integers and support atomic increment operations. Integer values are stored in the range –2⁶⁴ to 2⁶⁴‑1; values outside this range are treated as ordinary strings and cannot be incremented. Because a string consists of bytes (8 bits each), it can also be used as a bitmap.

> set foo 1<br/>OK<br/>> get foo<br/>"1"<br/>> incr foo<br/>(integer) 2<br/>> get foo<br/>"2"

Internal Principles

Basic Implementation

The core structure of a Redis string is shown in the diagram (content omitted). The content field holds the actual bytes and is terminated by a 0x0 byte that is not counted in the length.

struct SDS{<br/>    T capacity;        // array capacity<br/>    T len;            // actual length<br/>    byte flags;       // flag bits, low three indicate type<br/>    byte[] content;   // array content<br/>}

Both capacity and len are generic types rather than plain int to allow Redis to use the smallest possible integer type for each string, minimizing memory waste.

Encoding Formats

Redis strings can be stored using three encoding formats: int , embstr , and raw . Understanding these formats requires knowledge of the RedisObject header that precedes every Redis value.

struct RedisObject{<br/>    int4 type;        // data type (5 kinds)<br/>    int4 encoding;    // internal encoding (int, embstr, raw, …)<br/>    int24 lru;        // LRU information for memory eviction<br/>    int32 refcount;   // reference count<br/>    void *ptr;        // pointer to actual data<br/>}

int Encoding

When the stored value fits into a 64‑bit signed integer, Redis uses the int encoding, enabling fast atomic increment operations. Values in the range [0, 1000) are stored as shared objects, avoiding extra allocations.

> set foo 1<br/>OK<br/>> object encoding foo<br/>"int"<br/>> debug object foo<br/>Value at:0x7f44b020aca0 refcount:2147483647 encoding:int serializedlength:2 lru:14691591 lru_seconds_idle:72588

Both foo and foo2 point to the same shared object address.

embstr Encoding

For short strings (length ≤ 44 bytes), Redis uses the embstr (embedded string) encoding. The SDS structure is embedded directly inside the RedisObject, and a single malloc call allocates a contiguous memory block.

Diagram illustrating the embedded layout (image omitted).

raw Encoding

For longer strings (length > 44 bytes), Redis switches to the raw encoding. In this case the RedisObject and the SDS are allocated separately, so their memory addresses are not contiguous.

Diagram illustrating the separate allocation (image omitted).

Thoughts

The boundary between embstr and raw is 44 bytes because jemalloc, Redis’s default allocator, allocates memory in powers of two. The smallest allocation that can hold an entire embstr object is 32 bytes; the next size class is 64 bytes. Strings that would require more than 44 bytes of actual content (45 bytes including the terminating 0x0) fall into the next size class and are therefore stored as raw.

Thus, the practical limit for an embstr string is 44 bytes of content.

— THE END —

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databaseredisencodingData StructureStringSDS
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.