Big Data 13 min read

Mastering Redis for Big Data: Architecture, Code Samples, and Performance Hacks

This article walks through the NewLife.Redis library’s two‑layer architecture, demonstrates basic and advanced usage with C# examples, shows pressure‑testing results, and shares practical tips such as GetAll/SetAll, pipelines, and serialization tricks for high‑throughput big‑data scenarios.

dbaplus Community

Dec 2, 2018

Mastering Redis for Big Data: Architecture, Code Samples, and Performance Hacks

Redis Encapsulation Architecture

NewLife.Redis implements the full Redis protocol. The core Redis functionality resides in NewLife.Core under the NewLife.Caching namespace, which provides two main classes: Redis – implements basic Redis commands. RedisClient – represents a single TCP connection to a Redis server.

A connection pool holds many RedisClient instances, allowing concurrent access. The higher‑level FullRedis class extends Redis with advanced features (e.g., pipelines, batch operations).

Basic Usage Example (Test1)

The sample Program.cs demonstrates typical operations:

XTrace.UseConsole(); // enable console logging
var redis = new FullRedis("localhost");
redis.Set("key1", "string value"); // string stored directly
redis.Set("key2", new Person{ Name="Bob", Age=30}); // object JSON‑serialized
var val = redis.Get<Person>("key2"); // deserialized back to object
redis.Set("temp", 123, expire: 60); // expiration in seconds

// Dictionary (hash) usage
redis.Set("user:1", new { Id=1, Name="Alice"});
var name = redis.Get<string>("user:1", "Name");

// List as a queue
redis.LPush("queue", item);
var next = redis.RPop<ItemType>("queue");

// Set for deduplication
redis.SAdd("orderIds", orderId);
var count = redis.SCard("orderIds");

Key points:

String values are stored as raw bytes; non‑string objects are JSON‑serialized by default.

Expiration time is specified in seconds.

Redis hashes (dictionary) allow field‑level access without retrieving the whole object.

Lists implement FIFO queues; Sets provide O(1) deduplication and cardinality.

Pressure Testing

A multi‑threaded benchmark runs Get, Set, Remove and Increment operations. On a typical workstation the single‑threaded test reaches ~600 k operations per second (Ops) and multi‑threaded runs exceed 1 M Ops.

Important benchmark parameters: ThreadCount – splits the workload into independent groups. rand – when true, keys/values are generated randomly. batch – batch size; when >1 the test uses GetAll / SetAll to reduce round‑trips.

Advanced Functions to Boost Performance

GetAll() / SetAll() – batch multiple keys into a single command, reducing network latency dramatically. Using these methods can raise throughput from tens of thousands to hundreds of thousands of Ops.

Pipeline – groups several commands into one network packet. The library provides: StartPipeline() / StopPipeline() – manual control. AutoPipeline – automatically flushes when a configurable threshold is reached.

Add / Replace – conditional insertion and replacement. Add inserts only if the key does not exist (returns false otherwise). Replace overwrites an existing key and returns the previous value; if the key is absent nothing is done. These primitives are useful for implementing distributed locks.

Practical Deployment Tips

Deploy one Redis instance per CPU core on Linux; each instance should be limited to the host’s physical memory to avoid swapping.

Distribute billions of keys across instances using hash functions (e.g., CRC16/CRC32) for linear scaling.

Prefer binary serialization (MessagePack, protobuf, etc.) over JSON for higher throughput.

Design key/value payloads so that each network packet stays around 1.4 KB, minimizing the number of round‑trips.

Typical Get/Set latency (including network) is 200‑600 µs.

Primary bottlenecks are serialization cost, network bandwidth, and memory size; CPU becomes a limit only after these are saturated.

Cache Sibling – MemoryCache

NewLife.Redis implements the ICache interface. Its sibling MemoryCache provides in‑process caching with millions of Ops per second. Use MemoryCache for small datasets (≤ 100 k items) and switch to Redis when the data grows, without changing business code.

FAQ (Selected)

How to store one record with multiple keys? If performance is not critical, serialize the whole object as JSON and store it under a single key; a hash is unnecessary.

Difference between queue and List? Redis List supports both queue (FIFO) and stack (LIFO) semantics. Use the List directly for queue behavior.

Does a class with many fields affect performance? Generally no; only extremely large objects may show measurable differences.

How to handle analytics on billions of rows? Split tables/databases so each contains fewer than ten million rows.

Why does CPU usage spike? High CPU indicates the application is fully utilizing resources; if CPU is not saturated, further performance gains are unlikely.

Source Code

All source code and examples are available at:

https://github.com/NewLifeX/NewLife.Redis

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Big Data Redis Caching C#pipeline

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.