Why Reducing NFS Server Memory Triggers RPC Fragment‑Too‑Large Errors and How to Fix It
After lowering an NFS server's RAM from 8 GB to 4 GB, the server's max_block_size dropped, causing a mismatch with the client’s rsize/wsize settings, which led to RPC fragment‑too‑large errors; the article explains the root cause and shows how to resolve it by adjusting the server’s max_block_size.
Recently the memory of an NFS server was reduced from 8 GB to 4 GB while other servers remained unchanged. After a reboot everything seemed fine, but the next day the DBA reported that backups were not being written; the
df -Hcommand hung, indicating the NFS mount had failed.
Checking the NFS server logs revealed many RPC errors stating that the data fragment was too large. The issue appeared after the memory downgrade.
The NFS source defines the maximum read/write size with
#define NFS_MAX_FILE_IO_SIZE (1048576U), a default of 4096 bytes, and a minimum of 1024 bytes. The kernel limits the maximum block size based on available physical memory: each NFS kernel thread can use at most 1/4096 of the memory. For UDP the block size is limited to about 48 KB (kernel caps it at 32 KB), while TCP has no such limit and can reach up to 1 MB, but only on machines with more than 4 GB of RAM. The current limit is recorded in
/proc/fs/nfsd/max_block_size.
On the server the value was 512 KB. The client and server negotiate the
rsizeand
wsizevalues; they can be inspected with:
<code>cat /proc/mounts | grep rsize</code>The client showed both
rsizeand
wsizeas 1 048 567 (≈1 MB), while the server advertised only 512 KB, meaning the two sides were not negotiating the same block size.
Before the memory reduction the server had >4 GB RAM, so
max_block_sizewas 1 MB and the client negotiated that value. After the downgrade to 4 GB, the kernel reduced
max_block_sizeto 512 KB, but the client never re‑mounted, so it kept using the old 1 MB size, causing the RPC fragment‑too‑large errors.
Two remedies exist: remount the client or modify the server’s
max_block_size. The author chose the latter, stopping NFS, writing the new size to the proc file, and restarting the service:
<code>systemctl stop nfs
# modify max_block_size
echo 1048567 > /proc/fs/nfsd/max_block_size</code>After restarting NFS:
<code>systemctl start nfs</code>the logs returned to normal,
df -Hdisplayed the mount correctly, and data transfer worked. A final capture shows the NFS traffic using TCP with the expected 1 MB block size, confirming the fix.
The client‑remount approach was not used because the NFS server was down; unmounting the client would have required a reboot of both sides, which the author wanted to avoid.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.