Databases 11 min read

Can a Single Redis Instance Safely Use 50 GB on a 64 GB Machine? A NUMA Trap Experiment

This article investigates how large a single Redis instance can be on a 64 GB server, explains the NUMA memory‑allocation trap, and documents experiments that compare unbound versus CPU‑memory‑affinity‑bound deployments, revealing when swap and OOM occur.

ITPUB
ITPUB
ITPUB
Can a Single Redis Instance Safely Use 50 GB on a 64 GB Machine? A NUMA Trap Experiment

Background and Question

Our infrastructure team provides a cloud Redis platform where users can request any memory size for a Redis instance. I wondered how large a single‑process Redis instance could safely be on a 64 GB physical machine—could we allocate a 50 GB instance without harming performance?

Understanding the NUMA Trap

On NUMA (Non‑Uniform Memory Access) systems, a process prefers memory from the node where its CPU runs. If that node’s memory is exhausted, Linux may start swapping on the same node instead of allocating from other nodes, causing a sharp performance drop. This behavior affects memory‑intensive applications like Redis, MySQL, or MongoDB.

Checking NUMA Configuration

Running numactl --hardware shows two nodes, each with 12 CPUs and about 32 GB of RAM. The zone_reclaim_mode setting controls whether memory reclamation happens locally (value 1) or can use other nodes (value 0).

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
node 0 size: 32756 MB
node 0 free: 19642 MB
node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
node 1 size: 32768 MB
node 1 free: 18652 MB
node distances:
node   0   1
 0:  10  21
 1:  21  10

Our system has zone_reclaim_mode=1, meaning memory is reclaimed only within the local node.

0 – disabled, can reclaim from other nodes

1 – enabled, reclamation stays local

2 – dirty cache can be written back during local reclamation

4 – swap may be used for local reclamation

Experiment 1: No Affinity Binding

We started a Redis instance with maxmemory 50G without binding it to any specific node. Initial system stats showed plenty of free memory on both nodes.

# top
Mem: 65961428k total, 26748124k used, 39213304k free
Swap: 8388600k total, 0k used
# cat /proc/zoneinfo | grep "pages free"
pages free     4651908
pages free     4773314

After loading data, Redis used about 46 GB of resident memory, but the memory was distributed across both nodes, and no swap was triggered.

# top
Mem: 65961428k total, 53140400k used, 12821028k free
Swap: 8388600k total, 0k used
PID   USER   PR  NI   VIRT   RES   SHR S %CPU %MEM TIME+ COMMAND
8356  root   20   0   62.8g  46g 1292 S 0.0 74.5 3:45.34 redis-server

Because the process was not bound, the kernel allocated memory from whichever node the currently scheduled CPU belonged to, resulting in an even spread.

Experiment 2: Binding CPU and Memory Affinity

We then killed the process, bound both CPU and memory to node 0, and restarted Redis with the same 50 GB limit:

numactl --cpunodebind=0 --membind=0 /search/odin/daemon/redis/bin/redis-server /search/odin/daemon/redis/conf/redis.conf

Monitoring with top confirmed the process stayed on node 0, and memory on that node drained rapidly.

# cat /proc/zoneinfo
Node 0, zone   Normal
  pages free     10697
Node 1, zone   Normal
  pages free     7686732

Within minutes, swap usage spiked and the Redis process was killed by the OOM killer:

# top
Mem: 65961428k total, 34530000k used, 31431428k free
Swap: 8388600k total, 6000792k used
PID   USER   PR  NI   VIRT   RES   SHR S %CPU %MEM TIME+ COMMAND
25934 root   20   0   37.5g  30g 1224 D 71.6 48.7 1:06.09 redis-server

Conclusion

The experiments confirm that a NUMA trap exists when a process is bound to a single node: once the node’s memory is exhausted, the kernel falls back to swapping on that node, leading to severe performance degradation and OOM. Without explicit binding, Redis can utilize the total physical memory of the machine, though this may under‑utilize other CPU cores.

Further Thoughts

Binding both CPU and memory to the same node can improve memory‑I/O latency and increase Redis QPS, but it also makes the system vulnerable to the NUMA trap. Careful testing and monitoring are required for workloads that push memory limits.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceMemory ManagementLinuxNUMAnumactl
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.