Databases 29 min read

Master Redis Cluster Ops in 5 Minutes: Fix 90% of Production Issues

This comprehensive guide walks you through Redis cluster architecture, deployment, enterprise‑grade configuration management, logging, monitoring, queue handling, performance tuning, and fault‑tolerance practices, providing step‑by‑step instructions, scripts, and best‑practice recommendations to quickly resolve the majority of production problems.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master Redis Cluster Ops in 5 Minutes: Fix 90% of Production Issues

Redis Cluster Operations Tool: 5‑Minute Guide to Solving 90% of Production Failures

Redis Cluster Architecture Principles

Redis Cluster Deployment Configuration

Enterprise‑Grade Configuration Management

Log Management and Monitoring

Queue Settings and Management

Performance Optimization and Tuning

Fault Handling and Operations Practice

Cluster Mode Overview

Redis Cluster is a distributed Redis solution that provides high availability and horizontal scalability through data sharding and master‑slave replication. The cluster splits the entire keyspace into 16,384 hash slots, each node being responsible for a subset of slots.

Cluster node distribution example:
Master-1 (0-5460)   Master-2 (5461-10922)   Master-3 (10923-16383)
    |                     |                     |
Slave-1               Slave-2               Slave-3

Data Sharding Principle

Redis uses the CRC16 algorithm to hash a key, then takes the modulo 16384 to determine the slot where the key should be stored:

HASH_SLOT = CRC16(key) mod 16384

Fault Detection and Migration

The cluster uses the Gossip protocol for node communication. When a master node fails, its replica is automatically promoted to master, ensuring high availability.

Redis Cluster Deployment Configuration

Environment Preparation

System Requirements

Linux distribution: CentOS 7+, Ubuntu 18.04+

Redis version: 5.0+ (6.2+ recommended)

Minimum memory: 2 GB per node

Network latency between nodes < 1 ms

Server Planning

# 6‑node cluster plan (3 masters, 3 slaves)
192.168.1.10:7000  # Master‑1
192.168.1.11:7000  # Slave‑1
192.168.1.12:7000  # Master‑2
192.168.1.13:7000  # Slave‑2
192.168.1.14:7000  # Master‑3
192.168.1.15:7000  # Slave‑3

System Optimization Configuration

Kernel Parameter Tuning

# /etc/sysctl.conf
vm.overcommit_memory = 1
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
vm.swappiness = 0

System Limits Configuration

# /etc/security/limits.conf
redis soft nofile 65535
redis hard nofile 65535
redis soft nproc 65535
redis hard nproc 65535

Transparent Hugepage Disable

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
# make permanent
echo 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' >> /etc/rc.local
echo 'echo never > /sys/kernel/mm/transparent_hugepage/defrag' >> /etc/rc.local

Redis Installation and Configuration

Compile and Install Redis

# Install dependencies
yum install -y gcc gcc-c++ make
# Download source
wget http://download.redis.io/releases/redis-6.2.7.tar.gz
tar xzf redis-6.2.7.tar.gz
cd redis-6.2.7
# Build and install
make PREFIX=/usr/local/redis install
# Create user and directories
useradd -r -s /bin/false redis
mkdir -p /usr/local/redis/{conf,data,logs}
chown -R redis:redis /usr/local/redis

Cluster Configuration File

# /usr/local/redis/conf/redis-7000.conf
bind 0.0.0.0
port 7000
daemonize yes
pidfile /var/run/redis_7000.pid
logfile /usr/local/redis/logs/redis-7000.log
dir /usr/local/redis/data

# Cluster settings
cluster-enabled yes
cluster-config-file nodes-7000.conf
cluster-node-timeout 15000
cluster-announce-ip 192.168.1.10
cluster-announce-port 7000
cluster-announce-bus-port 17000

# Memory settings
maxmemory 2gb
maxmemory-policy allkeys-lru

# Persistence
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfilename "appendonly-7000.aof"
appendfsync everysec
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

# Security
requirepass "your_redis_password"
masterauth "your_redis_password"

# Network
tcp-keepalive 60
timeout 300
tcp-backlog 511

Systemd Service

# /etc/systemd/system/redis-7000.service
[Unit]
Description=Redis In-Memory Data Store (Port 7000)
After=network.target

[Service]
User=redis
Group=redis
ExecStart=/usr/local/redis/bin/redis-server /usr/local/redis/conf/redis-7000.conf
ExecStop=/usr/local/redis/bin/redis-cli -p 7000 shutdown
Restart=always

[Install]
WantedBy=multi-user.target

Cluster Initialization

Start All Nodes

# Start Redis on all nodes
systemctl start redis-7000
systemctl enable redis-7000
# Verify status
systemctl status redis-7000

Create Cluster

# Using redis-cli
/usr/local/redis/bin/redis-cli --cluster create \
192.168.1.10:7000 192.168.1.11:7000 192.168.1.12:7000 \
192.168.1.13:7000 192.168.1.14:7000 192.168.1.15:7000 \
--cluster-replicas 1 -a your_redis_password
# Or using redis-trib.rb (Redis 5.0 and earlier)
./redis-trib.rb create --replicas 1 \
192.168.1.10:7000 192.168.1.11:7000 192.168.1.12:7000 \
192.168.1.13:7000 192.168.1.14:7000 192.168.1.15:7000

Validate Cluster State

# Check cluster info
redis-cli -c -h 192.168.1.10 -p 7000 -a your_redis_password cluster info
redis-cli -c -h 192.168.1.10 -p 7000 -a your_redis_password cluster nodes

Enterprise Configuration Management

Configuration Templatization

Ansible Configuration Management

# redis-cluster-playbook.yml
---
- hosts: redis_cluster
  become: yes
  vars:
    redis_port: 7000
    redis_password: "{{ vault_redis_password }}"
    redis_maxmemory: "{{ ansible_memtotal_mb // 2 }}mb"
  tasks:
    - name: Install Redis dependencies
      yum:
        name: "{{ item }}"
        state: present
      loop:
        - gcc
        - gcc-c++
        - make
    - name: Create Redis user
      user:
        name: redis
        system: yes
        shell: /bin/false
    - name: Create Redis directories
      file:
        path: "{{ item }}"
        state: directory
        owner: redis
        group: redis
        mode: '0755'
      loop:
        - /usr/local/redis/conf
        - /usr/local/redis/data
        - /usr/local/redis/logs
    - name: Deploy Redis configuration
      template:
        src: redis.conf.j2
        dest: /usr/local/redis/conf/redis-{{ redis_port }}.conf
        owner: redis
        group: redis
        mode: '0640'
      notify: restart redis
    - name: Deploy systemd service
      template:
        src: redis.service.j2
        dest: /etc/systemd/system/redis-{{ redis_port }}.service
      notify: reload systemd
  handlers:
    - name: reload systemd
      systemd:
        daemon_reload: yes
    - name: restart redis
      systemd:
        name: redis-{{ redis_port }}
        state: restarted

Configuration Template (redis.conf.j2)

bind 0.0.0.0
port {{ redis_port }}
daemonize yes
pidfile /var/run/redis_{{ redis_port }}.pid
logfile /usr/local/redis/logs/redis-{{ redis_port }}.log
dir /usr/local/redis/data

# Cluster
cluster-enabled yes
cluster-config-file nodes-{{ redis_port }}.conf
cluster-node-timeout 15000
cluster-announce-ip {{ ansible_default_ipv4.address }}
cluster-announce-port {{ redis_port }}
cluster-announce-bus-port {{ redis_port | int + 10000 }}

# Memory
maxmemory {{ redis_maxmemory }}
maxmemory-policy allkeys-lru

# Persistence
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfilename "appendonly-{{ redis_port }}.aof"
appendfsync everysec

# Security
requirepass "{{ redis_password }}"
masterauth "{{ redis_password }}"

# Network
tcp-keepalive 60
timeout 300
tcp-backlog 511

Configuration Version Control

# Initialize config repository
mkdir /opt/redis-config
cd /opt/redis-config
git init
# Directory layout
mkdir -p environments/{dev,test,prod} templates scripts monitoring
# Example environment file (prod)
# environments/prod/group_vars/all.yml
redis_cluster_nodes:
  - host: 192.168.1.10
    port: 7000
    role: master
  - host: 192.168.1.11
    port: 7000
    role: slave

Log Management and Monitoring

Log Configuration and Classification

Log Level Configuration

# Redis log level
# debug   – verbose, for development
# verbose – many details
# notice  – moderate, suitable for production
# warning – only important messages
loglevel notice
logfile /usr/local/redis/logs/redis-7000.log
syslog-enabled yes
syslog-ident redis-7000
syslog-facility local0

Log Rotation

# /etc/logrotate.d/redis
/usr/local/redis/logs/*.log {
    daily
    missingok
    rotate 30
    compress
    delaycompress
    notifempty
    create 640 redis redis
    postrotate
        /bin/kill -USR1 `cat /var/run/redis_7000.pid 2>/dev/null` 2>/dev/null || true
    endscript
}

Monitoring Metrics Collection

Prometheus Monitoring Configuration

# prometheus.yml
global:
  scrape_interval: 15s
scrape_configs:
  - job_name: 'redis-cluster'
    static_configs:
      - targets: ['192.168.1.10:9121','192.168.1.11:9121','192.168.1.12:9121']
    scrape_interval: 10s
    metrics_path: /metrics

Redis Exporter Deployment

# Download Redis Exporter
wget https://github.com/oliver006/redis_exporter/releases/download/v1.45.0/redis_exporter-v1.45.0.linux-amd64.tar.gz
tar xzf redis_exporter-v1.45.0.linux-amd64.tar.gz
cp redis_exporter /usr/local/bin/
# Systemd service
cat > /etc/systemd/system/redis-exporter.service <<'EOF'
[Unit]
Description=Redis Exporter
After=network.target

[Service]
Type=simple
User=redis
ExecStart=/usr/local/bin/redis-exporter \
  -redis.addr=redis://localhost:7000 \
  -redis.password=your_redis_password
Restart=always

[Install]
WantedBy=multi-user.target
EOF
systemctl start redis-exporter
systemctl enable redis-exporter

Key Monitoring Metrics

# Memory usage
redis_memory_used_bytes
redis_memory_max_bytes
redis_memory_used_rss_bytes

# Connection count
redis_connected_clients
redis_blocked_clients
redis_rejected_connections_total

# Command stats
redis_commands_processed_total
redis_commands_duration_seconds_total

# Cluster status
redis_cluster_enabled
redis_cluster_nodes
redis_cluster_slots_assigned
redis_cluster_slots_ok
redis_cluster_slots_pfail
redis_cluster_slots_fail

# Replication
redis_replication_backlog_bytes
redis_replica_lag_seconds
redis_master_repl_offset

Log Analysis and Alerting

ELK Stack Integration (filebeat.yml)

# filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /usr/local/redis/logs/*.log
  fields:
    service: redis
    environment: production
  fields_under_root: true
output.logstash:
  hosts: ["logstash:5044"]
processors:
- add_host_metadata:
    when.not.contains.tags: forwarded

Logstash Configuration (logstash-redis.conf)

# input
input {
  beats {
    port => 5044
  }
}

filter {
  if [service] == "redis" {
    grok {
      match => { "message" => "%{POSINT:pid}:%{CHAR:role} %{GREEDYDATA:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}" }
    }
    date {
      match => ["timestamp", "dd MMM yyyy HH:mm:ss.SSS"]
    }
    if [level] == "WARNING" or [level] == "ERROR" {
      mutate { add_tag => ["alert"] }
    }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "redis-%{+YYYY.MM.dd}"
  }
}

Alert Rules (alertmanager-rules.yml)

# alertmanager-rules.yml
groups:
- name: redis.rules
  rules:
  - alert: RedisDown
    expr: redis_up == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Redis instance is down"
      description: "Redis instance {{ $labels.instance }} is down"
  - alert: RedisHighMemoryUsage
    expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.9
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Redis memory usage is high"
      description: "Redis memory usage is {{ $value }}%"
  - alert: RedisHighConnectionCount
    expr: redis_connected_clients > 1000
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Redis connection count is high"
      description: "Redis has {{ $value }} connections"
  - alert: RedisClusterNodeDown
    expr: redis_cluster_nodes{state="fail"} > 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Redis cluster node is down"
      description: "Redis cluster has {{ $value }} failed nodes"

Queue Settings and Management

Redis Queue Modes

List Queue Implementation

# Simple queue using List
LPUSH myqueue "message1"
LPUSH myqueue "message2"
# Consumer
RPOP myqueue
# Blocking consumption
BRPOP myqueue 0

Stream Queue Implementation

# Create a stream
XADD mystream * field1 value1 field2 value2
# Consumer group
XGROUP CREATE mystream mygroup 0 MKSTREAM
# Consume messages
XREADGROUP GROUP mygroup consumer1 COUNT 1 STREAMS mystream >
# Acknowledge
XACK mystream mygroup message_id

Enterprise Queue Configuration (redis-queue.conf)

# Basic settings
port 6379
bind 0.0.0.0
daemonize yes
pidfile /var/run/redis-queue.pid
logfile /usr/local/redis/logs/redis-queue.log
dir /usr/local/redis/data

# Memory (queues need more)
maxmemory 4gb
maxmemory-policy allkeys-lru

# Persistence (no data loss)
appendonly yes
appendfilename "appendonly-queue.aof"
appendfsync everysec
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

# Network
timeout 0
tcp-keepalive 300
tcp-backlog 511

# Client limits
maxclients 10000
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60

# Queue‑specific settings
list-max-ziplist-size -2
list-compress-depth 0
stream-node-max-bytes 4096
stream-node-max-entries 100

Queue Monitoring Script (Python)

#!/usr/bin/env python3
import redis, json, time, logging
from datetime import datetime
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

class RedisQueueMonitor:
    def __init__(self, host='localhost', port=6379, password=None):
        self.redis_client = redis.Redis(host=host, port=port, password=password, decode_responses=True)
    def monitor_list_queues(self, patterns):
        stats = {}
        for pattern in patterns:
            for q in self.redis_client.keys(pattern):
                length = self.redis_client.llen(q)
                stats[q] = {'type':'list','length':length,'timestamp':datetime.now().isoformat()}
                if length > 10000:
                    logging.warning(f"Queue {q} has {length} items")
        return stats
    def monitor_stream_queues(self, patterns):
        stats = {}
        for pattern in patterns:
            for s in self.redis_client.keys(pattern):
                try:
                    length = self.redis_client.xlen(s)
                    info = self.redis_client.xinfo_stream(s)
                    groups = self.redis_client.xinfo_groups(s)
                    stats[s] = {'type':'stream','length':length,'first_entry':info['first-entry'],'last_entry':info['last-entry'],'groups':len(groups),'timestamp':datetime.now().isoformat()}
                    for g in groups:
                        if g['lag'] > 1000:
                            logging.warning(f"Stream {s} group {g['name']} lag {g['lag']}")
                except Exception as e:
                    logging.error(f"Error monitoring stream {s}: {e}")
        return stats
    def get_memory_usage(self):
        info = self.redis_client.info('memory')
        return {k:info[k] for k in ('used_memory','used_memory_human','used_memory_peak','used_memory_peak_human')}
    def run_monitoring(self):
        queue_patterns = ['task:*','job:*','message:*']
        stream_patterns = ['stream:*','events:*']
        while True:
            try:
                list_stats = self.monitor_list_queues(queue_patterns)
                stream_stats = self.monitor_stream_queues(stream_patterns)
                mem = self.get_memory_usage()
                logging.info(json.dumps({'timestamp':datetime.now().isoformat(),'list_queues':list_stats,'stream_queues':stream_stats,'memory':mem}, indent=2))
                time.sleep(60)
            except Exception as e:
                logging.error(f"Monitoring error: {e}")
                time.sleep(10)

if __name__ == '__main__':
    monitor = RedisQueueMonitor(host='localhost', port=6379, password='your_redis_password')
    monitor.run_monitoring()

Queue Optimization Configuration

Memory Optimization for Queues

# Use ziplist compression for lists
list-max-ziplist-size -2
list-compress-depth 1
# Stream optimization
stream-node-max-bytes 4096
stream-node-max-entries 100
# Eviction policy
maxmemory-policy allkeys-lru

Persistence Optimization

# Disable RDB, enable AOF only
save ""
appendonly yes
appendfilename "appendonly-queue.aof"
appendfsync everysec
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-rewrite-incremental-fsync yes

Performance Optimization and Tuning

Cluster Performance Optimization

Slot Distribution Optimization

# Check slot distribution
redis-cli -c -h 192.168.1.10 -p 7000 -a password cluster slots
# Reshard slots
redis-cli --cluster reshard 192.168.1.10:7000 \
  --cluster-from source_node_id \
  --cluster-to target_node_id \
  --cluster-slots 1000 \
  --cluster-yes

Read‑Write Separation

# Slave read‑only configuration
replica-read-only yes
# Clients should send writes to masters and reads to slaves (application‑level)

Fault Handling and Operations Practice

Automatic Failover Script

#!/bin/bash
check_cluster_health() {
  result=$(redis-cli -c -h $1 -p $2 -a $3 cluster info 2>/dev/null | grep "cluster_state:ok")
  if [ -n "$result" ]; then
    return 0
  else
    return 1
  fi
}
if ! check_cluster_health "192.168.1.10" "7000" "password"; then
  echo "Cluster unhealthy, triggering failover procedures..."
  # Insert failover logic here
fi

Backup and Restore

# Backup script
BACKUP_DIR="/backup/redis/$(date +%Y%m%d)"
mkdir -p $BACKUP_DIR
# Trigger RDB snapshot
redis-cli -h 192.168.1.10 -p 7000 -a password BGSAVE
# Copy AOF files
cp /usr/local/redis/data/appendonly*.aof $BACKUP_DIR/
# Backup cluster node info
redis-cli -h 192.168.1.10 -p 7000 -a password cluster nodes > $BACKUP_DIR/cluster-nodes.txt

Summary

This article provides a complete, enterprise‑grade solution for operating Redis clusters on Linux, covering architecture design, standardized configuration management, comprehensive monitoring, automated operational scripts, and performance tuning to build a highly available and high‑performance Redis service.

Key Points

Architecture : 3‑master‑3‑slave standard cluster ensures high availability.

Configuration Management : Template‑based and version‑controlled configurations guarantee consistency.

Monitoring System : Full metric collection, log analysis, and alerting for proactive operations.

Queue Management : Choose appropriate queue models (List or Stream) for different scenarios.

Performance Tuning : Ongoing monitoring and parameter adjustments keep the system optimal.

Operational Recommendations

Regularly perform health checks and performance evaluations.

Establish robust backup and recovery mechanisms.

Define detailed fault‑handling procedures.

Continuously optimize configuration parameters and monitoring metrics.

Stay updated with new Redis features and best practices.

By following the best practices outlined in this guide, operations engineers can build and maintain a stable, efficient Redis cluster that reliably supports critical business workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceRedisCluster
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.