Databases 19 min read

Redis Best Practices and Common Pitfalls – Alibaba Cloud 2021 Guide

This guide outlines Redis's fundamental limitations, common performance issues such as big keys, slow queries, and connection‑pool exhaustion, and provides practical recommendations on architecture, configuration, and operational diagnostics to help developers avoid hitting Redis's boundaries.

Big Data Technology & Architecture

Apr 28, 2021

Redis Best Practices and Common Pitfalls – Alibaba Cloud 2021 Guide

Redis – Starting from Problems

Redis processes each event in a single‑threaded run‑to‑completion model, which means a slow query (e.g., KEYS, LRANGE, HGETALL) blocks subsequent requests, leading to latency spikes and connection‑pool exhaustion.

(1) Run‑to‑Completion in a Solo Thread – Redis’s Biggest Issue

After an event arrives, it is handled by the backend single thread; only when the current event finishes does the next one start. There is no dispatcher or multi‑worker pool, so slow commands degrade overall throughput.

(2) Expanding to Cluster Mode – Does It Solve the Problem?

Cluster mode distributes keys across multiple DBs, which alleviates some load, but it cannot solve the issue where a single DB is blocked because Redis hashes keys only by the top‑level key, not by sub‑keys or fields. Cross‑shard commands like MGET can still hit a problematic shard and block.

(3) Protocol Issues – Fully‑Meshed Clients and Engine

Redis uses the RESP protocol, which lacks a request/response identifier, making it hard to match replies to requests without a dispatcher. This leads to poor scalability and the classic C10K problem when many active connections are held.

(4) “Could not get a resource from the pool”

Because Redis cannot match requests to responses, clients must use a connection‑pool pattern. When the server is slow, the pool is exhausted, leading to the “Could not get a resource from the pool” error and a vicious cycle of increasing latency.

Redis – Do Not Touch the Boundaries

Computation : Wildcard queries, Lua concurrency, 1‑to‑N PUBSUB, global configs, and hot keys consume high CPU.

Storage : Streaming slow consumption and big keys increase memory usage.

Network : Full‑scan commands (e.g., KEYS, large MGET/MSET, big values) cause high network traffic.

High concurrency ≠ high throughput : Large values do not improve throughput and can be dangerous.

Data and compute skew : Big keys break storage balance; hotspots break CPU balance.

Storage boundaries : Misusing Lua dramatically raises CPU cost.

Latency understanding : P99 latency is inherent; occasional high latency is expected due to engine complexity.

Redis – Internal Development Guidelines

Recommendations :

Identify whether the use case is cache or persistent storage.

Cache principle: “It works without it, but works better with it.”

Never rely on cache permanence; it can be evicted.

Design proper data structures and avoid big keys (they cause ~80% of problems).

Separate keyspaces across multiple Redis instances when possible.

Avoid using PUBSUB for reliable messaging.

Minimize Lua usage; prefer native data structures or modules.

BigKey – The Flood Beast : One key can consume ~80% of memory and become a hotspot, leading to CPU and network saturation.

(2) Redis Lua JIT

Lua scripts are compiled to bytecode (SHA hash) and executed on replicas, consuming significant CPU. Using EVALSHA can reduce compile overhead but still incurs high cost; prefer native structures or modules.

(3) PubSub / Transaction / Pipeline

PubSub is suitable for simple signals but not for reliable updates; it can become a CPU bottleneck. Transactions are pseudo‑transactions without rollback; they require hash‑tags in cluster mode and increase CPU load. Pipelines batch commands to reduce I/O but are not atomic; failures must be handled per command.

(4) KEYS Command

KEYS always triggers a full scan and will become a slow query as the dataset grows; replace it with SCAN or redesign data layout (e.g., use hashes).

(5) Other Dangerous Commands

HGETALL

, SMEMBERS, LRANGE, ZRANGE – O(N) scans; prefer SCAN. BITOP, BITSET with far offsets can cause OOM. FLUSHALL, FLUSHDB – data loss; use with double confirmation.

Redis – Common Issue Handling

(1) Tair/Redis Memory Model

Memory is divided into dynamic link memory (input/output buffers, JIT cache), data memory (user values), and static management memory (hash tables, AOF buffers). Large keys or heavy Lua scripts can inflate dynamic memory and cause OOM.

(2) Cache Analysis – Memory Distribution, BigKey, Key Pattern

Alibaba Cloud provides one‑click cache analysis (HotKey, BigKey, memory distribution) for both standard and cluster deployments, supporting community and enterprise versions.

(3) HotKey Analysis

Online real‑time analysis (P99 QPS > 3000) and offline analysis via imonitor + redis‑faina are available.

(4) Full‑Link Diagnosis

Diagnose issues across client SDK, network, VIP, proxy, and DB layers. Front‑end checks include ECS health, connection‑pool exhaustion, high RT, and DNS delays. Back‑end checks focus on slow queries, CPU, and traffic bottlenecks.

(5) Diagnostic Reports

One‑click diagnostic reports show core metric curves, top slow commands, performance watermarks, and real‑time or historical analysis.

(6) Slow Log Configuration

Set reasonable thresholds for slowlog-log-slower-than (DB) and rt_threshold_ms (Proxy) to avoid excessive logging that degrades performance.

Both historical (last 72 hours) and real‑time slow logs are accessible via the console under “Instance Management → Log Management → Slow Log”.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Cache Memory Management Redis best practices diagnostics bigkey

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.