Unlocking Memory Secrets: Why Random IO Slows Down and How to Optimize It
This article explores the physical structure of RAM, compares random and sequential memory I/O performance, examines real‑world bandwidth versus advertised specs, delves into NUMA latency differences, and shows practical optimization techniques for PHP7 and Redis based on deep hardware and kernel knowledge.
Part 1: Physical Aspects
The first three articles explain the hardware fundamentals of memory:
Understanding Memory Alignment Fundamentals – Uses the familiar concept of memory alignment to illustrate how memory modules are physically built and how they interact with the bus to transfer data to the CPU.
Random vs. Sequential Memory I/O Speed – Shows that, contrary to common belief, random memory accesses are 3–4 times slower than sequential accesses, based on measured latency.
From DDR to DDR4: Frequency Limits – Clarifies why newer DDR generations do not dramatically increase core frequency, describing prefetch limitations, non‑continuous data storage, and bank‑group effects that prevent reaching advertised speeds.
Part 2: Hands‑On Tests
After grasping the theory, three practical experiments were performed:
Sequential vs. Random I/O Latency Test – Measured memory latency under various conditions; the fastest path is actually the CPU cache, while sequential I/O averages ~10 ns and random I/O is about four times slower.
Real Bandwidth vs. Manufacturer Claims – Tested a memory module advertised at 8.5 GB/s; under worst‑case conditions the observed bandwidth dropped to only 474 MB/s, highlighting the impact of CL, tRCD, tRP delays and QPI bus overhead.
NUMA Architecture Latency Differences – Demonstrated that memory accesses within the same NUMA node are faster than cross‑node accesses, which must travel the QPI bus and incur additional latency.
Part 3: Practical Applications
Finally, three real‑world use cases show why deep hardware knowledge matters for developers:
PHP 7 Memory Performance Optimizations – Examines kernel data‑structure tweaks that save a few bytes but yield large performance gains when the CPU‑memory interaction is understood.
Turning Random I/O into Sequential I/O – Describes a project that reorganizes memory access patterns to keep the memory subsystem in its most efficient state, achieving multiple‑fold speed improvements.
Redis Single‑Instance Memory Limit and NUMA Trap – Explores how a single Redis process can hit memory limits on NUMA systems and what pitfalls developers should avoid.
The author concludes the memory series and promises upcoming articles on storage and networking, wishing readers a happy 2020.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Refining Core Development Skills
Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
