Can a Single Redis Instance Safely Use 50 GB on a 64 GB Machine? A NUMA Trap Experiment
This article investigates how large a single Redis instance can be on a 64 GB server, explains the NUMA memory‑allocation trap, and documents experiments that compare unbound versus CPU‑memory‑affinity‑bound deployments, revealing when swap and OOM occur.
Background and Question
Our infrastructure team provides a cloud Redis platform where users can request any memory size for a Redis instance. I wondered how large a single‑process Redis instance could safely be on a 64 GB physical machine—could we allocate a 50 GB instance without harming performance?
Understanding the NUMA Trap
On NUMA (Non‑Uniform Memory Access) systems, a process prefers memory from the node where its CPU runs. If that node’s memory is exhausted, Linux may start swapping on the same node instead of allocating from other nodes, causing a sharp performance drop. This behavior affects memory‑intensive applications like Redis, MySQL, or MongoDB.
Checking NUMA Configuration
Running numactl --hardware shows two nodes, each with 12 CPUs and about 32 GB of RAM. The zone_reclaim_mode setting controls whether memory reclamation happens locally (value 1) or can use other nodes (value 0).
# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
node 0 size: 32756 MB
node 0 free: 19642 MB
node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
node 1 size: 32768 MB
node 1 free: 18652 MB
node distances:
node 0 1
0: 10 21
1: 21 10Our system has zone_reclaim_mode=1, meaning memory is reclaimed only within the local node.
0 – disabled, can reclaim from other nodes
1 – enabled, reclamation stays local
2 – dirty cache can be written back during local reclamation
4 – swap may be used for local reclamation
Experiment 1: No Affinity Binding
We started a Redis instance with maxmemory 50G without binding it to any specific node. Initial system stats showed plenty of free memory on both nodes.
# top
Mem: 65961428k total, 26748124k used, 39213304k free
Swap: 8388600k total, 0k used
# cat /proc/zoneinfo | grep "pages free"
pages free 4651908
pages free 4773314After loading data, Redis used about 46 GB of resident memory, but the memory was distributed across both nodes, and no swap was triggered.
# top
Mem: 65961428k total, 53140400k used, 12821028k free
Swap: 8388600k total, 0k used
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8356 root 20 0 62.8g 46g 1292 S 0.0 74.5 3:45.34 redis-serverBecause the process was not bound, the kernel allocated memory from whichever node the currently scheduled CPU belonged to, resulting in an even spread.
Experiment 2: Binding CPU and Memory Affinity
We then killed the process, bound both CPU and memory to node 0, and restarted Redis with the same 50 GB limit:
numactl --cpunodebind=0 --membind=0 /search/odin/daemon/redis/bin/redis-server /search/odin/daemon/redis/conf/redis.confMonitoring with top confirmed the process stayed on node 0, and memory on that node drained rapidly.
# cat /proc/zoneinfo
Node 0, zone Normal
pages free 10697
Node 1, zone Normal
pages free 7686732Within minutes, swap usage spiked and the Redis process was killed by the OOM killer:
# top
Mem: 65961428k total, 34530000k used, 31431428k free
Swap: 8388600k total, 6000792k used
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25934 root 20 0 37.5g 30g 1224 D 71.6 48.7 1:06.09 redis-serverConclusion
The experiments confirm that a NUMA trap exists when a process is bound to a single node: once the node’s memory is exhausted, the kernel falls back to swapping on that node, leading to severe performance degradation and OOM. Without explicit binding, Redis can utilize the total physical memory of the machine, though this may under‑utilize other CPU cores.
Further Thoughts
Binding both CPU and memory to the same node can improve memory‑I/O latency and increase Redis QPS, but it also makes the system vulnerable to the NUMA trap. Careful testing and monitoring are required for workloads that push memory limits.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
