Tagged articles

numa

76 articles · Page 1 of 1

May 26, 2026 · Fundamentals

Can't Master the Linux Kernel Without Understanding NUMA?

This article explains the core principles of NUMA architecture, how it is deeply integrated into Linux kernel memory management, process scheduling, and system calls, and provides practical commands and real‑world examples to diagnose and optimize NUMA‑related performance issues.

Linux kernelMemory ManagementPerformance Optimization

0 likes · 24 min read

Can't Master the Linux Kernel Without Understanding NUMA?

Tech Stroll Journey

May 24, 2026 · Operations

Practical Strategies for CPU Performance Optimization on Linux

The article walks through six concrete, reproducible methods for diagnosing and improving Linux CPU performance—including using perf for profiling, binding processes to specific cores, adjusting scheduling priorities, setting the CPU governor, leveraging NUMA awareness, and fine‑tuning kernel scheduler parameters—while showing real command examples and measured impact.

CPUSchedulernuma

0 likes · 11 min read

Practical Strategies for CPU Performance Optimization on Linux

Deepin Linux

May 6, 2026 · Fundamentals

Master Linux Memory Performance: From Theory to Real‑World Optimization

This article systematically breaks down Linux's core memory mechanisms, identifies common performance bottlenecks, and demonstrates how to use tools like numastat, perf, and Valgrind together with kernel parameters such as swappiness and min_free_kbytes to achieve practical memory optimizations.

HugePagesLinuxOOM

0 likes · 55 min read

Master Linux Memory Performance: From Theory to Real‑World Optimization

Deepin Linux

Apr 27, 2026 · Fundamentals

Understanding the SLUB Memory Allocator: A Deep Dive into Linux Kernel Object Management

SLUB, the default Linux kernel memory allocator, reduces fragmentation and improves allocation speed for frequently created objects like task_struct and inode by using per‑CPU caches, object slabs, and NUMA‑aware node caches, with detailed structures, allocation/free paths, tuning parameters, and real‑world case studies.

Linux kernelPerformance TuningSLUB

0 likes · 46 min read

Understanding the SLUB Memory Allocator: A Deep Dive into Linux Kernel Object Management

Deepin Linux

Mar 28, 2026 · Fundamentals

Unlocking Linux Performance: A Deep Dive into NUMA Architecture

This article explains the core principles of NUMA, its deep integration with the Linux kernel, practical memory‑node and scheduling mechanisms, real‑world database and virtualization use cases, and step‑by‑step commands for inspecting and tuning NUMA on modern servers.

Linux kernelMemory ManagementPerformance Optimization

0 likes · 23 min read

Unlocking Linux Performance: A Deep Dive into NUMA Architecture

TonyBai

Mar 12, 2026 · Backend Development

Why My Go Service Slowed Down on a 128‑Core Server

A 128‑core, 256‑thread server should boost Go microservice performance, but the author explains how NUMA architecture, Go's scheduler affinity loss during GC pauses, and non‑NUMA‑aware memory allocation cause cache misses, remote memory penalties, and higher latency, preventing linear scaling.

Garbage CollectionGoHigh‑core performance

0 likes · 9 min read

Why My Go Service Slowed Down on a 128‑Core Server

dbaplus Community

Feb 24, 2026 · Cloud Native

How CPU Architecture Bottlenecks Cripple Netflix’s Container Scaling

Netflix discovered that scaling hundreds of containers on modern CPUs hit severe lock‑contention due to mount‑related kernel locks, with performance varying across AWS instance types, NUMA designs, and hyper‑threading, leading them to redesign containerd mounting and choose hardware‑aware scheduling to restore efficient scaling.

AWSCPU architectureHyper-threading

0 likes · 16 min read

How CPU Architecture Bottlenecks Cripple Netflix’s Container Scaling

TonyBai

Jan 31, 2026 · Backend Development

Will Go’s Performance Diagnostics Undergo a Revolution? Race Detection in Production and Instant Trace Opening

The article analyzes recent Go runtime meeting notes that reveal upcoming changes such as lightweight race detection via software or hardware, a new instant‑open Trace UI with on‑demand slicing, read/write Trace APIs, pprof modernization removing global variables, NUMA‑aware GC optimizations and sharded counters, all pointing to a more usable and high‑performance Go 1.27.

Execution TraceGoRace Detection

0 likes · 8 min read

Will Go’s Performance Diagnostics Undergo a Revolution? Race Detection in Production and Instant Trace Opening

FunTester

Jan 20, 2026 · Fundamentals

Why Data Movement, Not CPU Speed, Is the Real Performance Bottleneck

Most engineers blame slow CPUs for performance issues, but the true bottleneck is often data latency—from registers and caches to DRAM, NUMA nodes, disks, and networks—so understanding and minimizing data movement is key to reducing tail latency and improving system performance.

LatencySystemsdata locality

0 likes · 11 min read

Why Data Movement, Not CPU Speed, Is the Real Performance Bottleneck

360 Zhihui Cloud Developer

Dec 30, 2025 · Cloud Native

How HBox Boosts GPU Utilization with Multi‑Pool and NUMA‑Aware Scheduling

The HBox scheduling platform tackles large‑scale AI cluster challenges by introducing a three‑pool resource model, priority‑based preemptive scheduling, network‑topology and NUMA‑aware dispatch, and GPU virtualization techniques like MIG and vGPU, dramatically improving GPU utilization, SLA guarantees, and overall cluster efficiency.

AI clustersGPU schedulingGPU virtualization

0 likes · 24 min read

How HBox Boosts GPU Utilization with Multi‑Pool and NUMA‑Aware Scheduling

Ops Community

Nov 3, 2025 · Operations

Master Linux Memory Management: Core Commands & Tuning in 10 Minutes

This comprehensive guide walks you through Linux memory management fundamentals, from prerequisite environments and a quick checklist to step‑by‑step installation of monitoring tools, memory diagnostics, kernel parameter adjustments, THP and swap optimization, NUMA affinity tuning, validation, Prometheus alerts, security hardening, troubleshooting, rollback procedures, best‑practice recommendations, and ready‑to‑use scripts and configuration snippets.

LinuxMemory ManagementPerformance Tuning

0 likes · 24 min read

Master Linux Memory Management: Core Commands & Tuning in 10 Minutes

MaGe Linux Operations

Aug 25, 2025 · Operations

Why Your 128‑Core Server Underperforms: Unlock 300% Gains with CPU Affinity

This article explains why a newly purchased 128‑core AMD EPYC server may perform worse than a 32‑core machine, demonstrates how improper CPU affinity and NUMA configuration cause severe performance loss, and provides step‑by‑step practical methods—including system topology analysis, taskset, numactl, kernel scheduler tweaks, and container settings—to achieve up to 300% improvement.

CPU affinitynuma

0 likes · 15 min read

Why Your 128‑Core Server Underperforms: Unlock 300% Gains with CPU Affinity

Deepin Linux

Jul 8, 2025 · Operations

Unlock Linux NUMA Performance: A Practical Multithreaded Tuning Guide

This article explains the fundamentals of NUMA architecture, why it matters for multithreaded Linux applications, and provides step‑by‑step practical guidance—including kernel internals, memory allocation policies, useful commands, and performance‑monitoring tools—to help developers optimize memory locality and boost overall program efficiency.

LinuxPerformance Tuningmultithreading

0 likes · 37 min read

Unlock Linux NUMA Performance: A Practical Multithreaded Tuning Guide

IT Services Circle

Jul 8, 2025 · Fundamentals

Unlocking Linux NUMA: How the Kernel Detects and Manages Non‑Uniform Memory Access

This article explains the hardware basis of NUMA, how Linux reads ACPI SRAT/SLIT tables to discover CPU‑memory topology, the kernel functions that initialize NUMA structures, and how tools like numactl can be used to optimize application performance on multi‑node servers.

ACPILinux kernelMemory Architecture

0 likes · 14 min read

Unlocking Linux NUMA: How the Kernel Detects and Manages Non‑Uniform Memory Access

Bilibili Tech

Jul 4, 2025 · Operations

Solving CPU Performance Layering in Heterogeneous Data Centers: A Practical Guide

This article explains why heterogeneous servers cause CPU performance layering, describes how to detect the issue using metrics such as NUMA hit/miss rates, cache miss ratios and frequency states, and provides step‑by‑step remediation techniques—including NUMA binding, cache isolation, recompilation and frequency locking—to improve resource pooling efficiency in modern data centers.

CPU performanceCache OptimizationData Center

0 likes · 24 min read

Solving CPU Performance Layering in Heterogeneous Data Centers: A Practical Guide

Refining Core Development Skills

May 27, 2025 · Fundamentals

Understanding NUMA in Linux: Hardware Principles, ACPI Tables, and Kernel Initialization

This article explains the hardware basis of NUMA, how Linux reads ACPI SRAT and SLIT tables to discover CPU‑memory topology, the kernel functions that initialize NUMA structures, and how the memblock allocator incorporates this information to enable performance‑optimizing tools like numactl.

ACPILinux kernelMemory Management

0 likes · 13 min read

Understanding NUMA in Linux: Hardware Principles, ACPI Tables, and Kernel Initialization

Linux Kernel Journey

Mar 8, 2025 · Backend Development

Optimizing MPTCP Flow Selection and Exploring a User‑Space MPTCP Stack – ByteDance STE at Netdev 0x19

At Netdev 0x19, ByteDance's STE team presented two technical talks: a NUMA‑aware MPTCP flow‑selection strategy that boosts Redis benchmark throughput by up to 30% and cuts tail latency by 6%, and a DPDK‑based user‑space MPTCP stack that halves latency and doubles throughput in data‑center tests.

DPDKLinux networkingMPTCP

0 likes · 8 min read

Optimizing MPTCP Flow Selection and Exploring a User‑Space MPTCP Stack – ByteDance STE at Netdev 0x19

ByteDance SYS Tech

Mar 7, 2025 · Fundamentals

How NUMA‑Aware MPTCP Flow Selection Boosts Throughput and Cuts Latency

At Netdev 0x19, ByteDance's STE team presented two talks—one on a NUMA‑locality‑aware MPTCP flow‑selection strategy that can raise throughput by up to 30% and lower tail latency by 6%, and another on a DPDK‑based user‑space MPTCP stack that reduces latency by nearly 10% and more than doubles throughput—showcasing practical performance gains for data‑center networking.

DPDKData Center NetworkingMPTCP

0 likes · 8 min read

How NUMA‑Aware MPTCP Flow Selection Boosts Throughput and Cuts Latency

Linux Kernel Journey

Feb 16, 2025 · Fundamentals

Understanding Multi‑Core Hardware Topology and Linux sched_domain

The article explains how Linux kernel scheduling uses a hierarchical topology—balancing load and preserving cache affinity—by mapping real‑world multi‑core hardware structures such as sockets, dies, clusters, and NUMA nodes to sched_domain and sched_group, and shows how to inspect and tune this layout with CONFIG_SCHED_DEBUG and QEMU simulation.

Linuxhardware topologykernel

0 likes · 9 min read

Understanding Multi‑Core Hardware Topology and Linux sched_domain

Linux Code Review Hub

Feb 12, 2025 · Fundamentals

Understanding Multi‑core Hardware Architecture and Linux sched_domain

The article explains how Linux builds sched_domain and sched_group hierarchies based on physical CPU topology—sockets, dies, clusters, and NUMA nodes—illustrating load‑balancing (BALANCE) versus affinity (AFFINE) with concrete examples, kernel code references, and QEMU‑based experiments.

CPU topologyKernel Schedulingnuma

0 likes · 9 min read

Understanding Multi‑core Hardware Architecture and Linux sched_domain

Deepin Linux

Dec 30, 2024 · Fundamentals

Understanding NUMA Node Detection and Memory Management in the Linux Kernel

This article explains the fundamentals of NUMA architecture, how Linux detects and represents NUMA nodes, the memory zone hierarchy, allocation policies, and practical techniques such as using numactl and taskset to bind processes for optimal performance on multi‑socket servers.

Linux kernelMemory ManagementPerformance Optimization

0 likes · 22 min read

Understanding NUMA Node Detection and Memory Management in the Linux Kernel

360 Zhihui Cloud Developer

Nov 18, 2024 · Cloud Computing

How Dynamic Resource Scheduling Boosts OpenStack Efficiency and Cuts Costs

Virtualization resource scheduling algorithms, especially in OpenStack, address fragmented CPU allocation and uneven node utilization by dynamically consolidating VMs, employing NUMA-aware placement, and using resource scoring to trigger migrations, ultimately improving utilization, reducing costs, and enhancing performance in cloud environments.

Cloud ComputingOpenStacknuma

0 likes · 12 min read

How Dynamic Resource Scheduling Boosts OpenStack Efficiency and Cuts Costs

Architects' Tech Alliance

Sep 20, 2024 · Operations

Unlocking Kunpeng CPU Performance: Real-World Optimization Techniques and Benchmarks

This article provides a comprehensive, step‑by‑step guide to tuning Kunpeng‑based servers, covering hardware characteristics, matrix‑multiplication benchmarks, NUMA‑aware scheduling, compiler and JDK optimizations, acceleration libraries, disk and NIC tuning, and a practical MariaDB performance‑tuning workflow.

CPU optimizationKunpengLinux

0 likes · 17 min read

Unlocking Kunpeng CPU Performance: Real-World Optimization Techniques and Benchmarks

Tencent Cloud Developer

Aug 15, 2024 · Databases

Architecture Upgrade Challenges and Atomic Write Solutions for Cloud-native Databases

Collaborating across TencentOS and database kernel teams, the article details how architecture upgrades—moving to TKE HouseKeeper, switching to AMD CPUs, and adding a portable 16 KB atomic‑write feature—combined with kernel optimizations like huge‑page support, NUMA‑aware qspinlocks, speculative page‑fault handling, and ORC unwinding to deliver up to 30 % mixed workload and over 100 % write‑only performance gains while reducing memory usage.

Kernel OptimizationORC unwinderatomic write

0 likes · 16 min read

Architecture Upgrade Challenges and Atomic Write Solutions for Cloud-native Databases

Architects' Tech Alliance

Jul 13, 2024 · Operations

How to Supercharge Kunpeng CPUs: Real‑World Performance Tuning Techniques

This article provides a comprehensive guide to optimizing Kunpeng‑based servers, covering hardware characteristics, matrix multiplication benchmarks, Von Neumann architecture insights, soft and hard acceleration, compiler and JDK tweaks, NUMA tuning, Nginx and OpenSSL acceleration, disk and network optimizations, application‑level tuning, and a step‑by‑step MariaDB performance‑tuning checklist.

CPU performanceDatabase TuningKunpeng

0 likes · 16 min read

How to Supercharge Kunpeng CPUs: Real‑World Performance Tuning Techniques

Alibaba Cloud Infrastructure

Jul 5, 2024 · Cloud Native

Koordinator v1.5.0 Release: New Features and Enhancements

Koordinator v1.5.0, the 13th major release since its open‑source debut, introduces pod‑level NUMA alignment, Terway network QoS, core scheduling, and numerous performance and stability improvements, while also being accepted as a CNCF Sandbox project and outlining future roadmap plans.

Cloud NativeCore SchedulingScheduling

0 likes · 14 min read

Koordinator v1.5.0 Release: New Features and Enhancements

Open Source Linux

Apr 29, 2024 · Fundamentals

Unlocking DPDK Memory Management: How Hugepages Boost Performance

This article consolidates DPDK 17.11 source‑code notes to explain the library’s memory‑management subsystem, covering hugepage concepts, shared configuration mapping, NUMA‑aware allocation, and the custom allocator that enables high‑throughput packet processing on Linux.

DMADPDKHugePages

0 likes · 40 min read

Unlocking DPDK Memory Management: How Hugepages Boost Performance

OPPO Kernel Craftsman

Apr 19, 2024 · Fundamentals

Large Folios in the Linux Kernel: Benefits, Implementations, and Future Directions

Large folios in the Linux kernel combine multiple pages to reduce TLB misses, page faults, and reclamation cost while enabling more efficient compression; they are supported by filesystems like XFS and bcachefs, and recent patches add multi‑size THP, swap‑in/out handling, TAO allocation, NUMA balancing, and debug tools, with OPPO’s production deployment showing performance gains and motivating broader adoption and fragmentation mitigation.

TLBlarge foliosmTHP

0 likes · 17 min read

Large Folios in the Linux Kernel: Benefits, Implementations, and Future Directions

MaGe Linux Operations

Mar 1, 2024 · Operations

Master Linux Virtualization: Tuning KVM Performance with Tuned and KSM

This guide walks through Linux virtualization management and performance tuning, covering tuned profiles for guests and hosts, kernel parameters, NUMA awareness, CPU pinning, memory limits, KSM configuration, qcow2 image creation, disk cache modes, I/O throttling, and monitoring commands to optimize KVM workloads.

KSMKVMLinux

0 likes · 21 min read

Master Linux Virtualization: Tuning KVM Performance with Tuned and KSM

Baidu Geek Talk

Feb 19, 2024 · Operations

Boost Cloud Application Speed by 36% Using Baidu’s Btune Performance Diagnostic Tool

After migrating workloads to a new CPU platform, unexpected performance regressions can occur, but Baidu Cloud's Btune tool provides automated, multi‑dimensional analysis and actionable optimization suggestions that helped a test program improve its execution time by 36.8% through memory and NUMA tuning.

BtuneCPUCloud Computing

0 likes · 9 min read

Boost Cloud Application Speed by 36% Using Baidu’s Btune Performance Diagnostic Tool

ByteDance Cloud Native

Jan 24, 2024 · Cloud Native

How Katalyst v0.4.0 Brings Tidal Colocation and Resource Overcommit to Native Kubernetes

Katalyst v0.4.0 introduces tidal colocation for mixed workloads, online resource overcommit, fine‑grained NUMA memory management, OOM priority enhancements, and topology‑aware scheduling, providing a comprehensive cost‑optimization solution for cloud‑native Kubernetes clusters.

OOM PriorityOvercommitTidal Colocation

0 likes · 12 min read

How Katalyst v0.4.0 Brings Tidal Colocation and Resource Overcommit to Native Kubernetes

Baidu Intelligent Cloud Tech Hub

Jan 24, 2024 · Operations

Boost Cloud App Performance by 36% with Baidu’s Btune Diagnostic Tool

This article explains how Baidu Cloud’s Btune performance‑diagnostic tool helps identify CPU, memory and NUMA bottlenecks, provides automatic optimization suggestions, and demonstrates a real‑world test that improves a memory‑intensive program’s runtime by up to 36.8% after applying the recommended changes.

BtuneCloud Computingdiagnostic tool

0 likes · 10 min read

Boost Cloud App Performance by 36% with Baidu’s Btune Diagnostic Tool

Deepin Linux

Jul 22, 2023 · Fundamentals

DPDK Memory Management: Architecture, Hugepage Initialization, and Allocation Mechanisms

This article explains DPDK's memory management architecture, covering the hierarchical memory layout, hugepage discovery and mapping, shared configuration structures, NUMA‑aware allocation, custom malloc‑heap implementation, memzone and mempool creation, and the mbuf buffer model, with detailed code examples.

HugePagesMemory Managementmalloc heap

0 likes · 41 min read

DPDK Memory Management: Architecture, Hugepage Initialization, and Allocation Mechanisms

AI Cyberspace

May 19, 2023 · Cloud Computing

Mastering OpenStack Neutron SR‑IOV: Boost Network Performance with VLAN & NUMA

This guide explains the performance limitations of Neutron OVS networking, introduces SR‑IOV as a high‑performance I/O virtualization solution, and provides step‑by‑step configuration for enabling SR‑IOV agents, mapping physical networks, creating VLAN and flat networks, handling NUMA affinity, security groups, and bonding, with detailed command examples and XML snippets.

Network PerformanceNeutronOVS

0 likes · 27 min read

Mastering OpenStack Neutron SR‑IOV: Boost Network Performance with VLAN & NUMA

Bin's Tech Cabin

May 4, 2023 · Fundamentals

How Linux’s Slab Allocator Manages Memory: Deep Dive into Fast and Slow Paths

This article dissects the Linux kernel’s slab allocator, explaining its complete architecture, the fast‑path allocation from per‑CPU caches, the slow‑path mechanisms involving partial lists, NUMA node caches, and fallback to the buddy system, while detailing object initialization and freelist construction.

LinuxMemory Managementnuma

0 likes · 41 min read

How Linux’s Slab Allocator Manages Memory: Deep Dive into Fast and Slow Paths

AI Cyberspace

Mar 28, 2023 · Fundamentals

Why NUMA Slows Multithreaded Apps and How to Optimize It

This article explains NUMA architecture, its multithreaded performance overheads such as remote memory access, cache synchronization, context and mode switches, interrupt handling, TLB misses, and memory copies, and then presents optimization techniques like NUMA and CPU affinity, IRQ tuning, and large‑page usage.

CPU affinityLinuxmultithreading

0 likes · 20 min read

Why NUMA Slows Multithreaded Apps and How to Optimize It

ByteDance SYS Tech

Feb 10, 2023 · Fundamentals

Mastering Linux Memory: Reclaim, Huge Pages, and NUMA Optimization

This article explains common Linux memory‑related performance bottlenecks—such as memory reclamation, page‑cache pressure, huge‑page usage, and cross‑NUMA access—and provides practical tuning methods to improve latency and throughput on servers and applications.

Huge Pagesnuma

0 likes · 16 min read

Mastering Linux Memory: Reclaim, Huge Pages, and NUMA Optimization

Bin's Tech Cabin

Dec 28, 2022 · Fundamentals

How Linux Allocates Physical Memory: Inside the Kernel’s Buddy Allocator

This article walks through Linux kernel physical memory allocation, explaining the hierarchy of allocation interfaces, the role of gfp_mask and ALLOC flags, the fast and slow allocation paths, memory watermarks, NUMA zone handling, and the complex fallback mechanisms including compaction, direct reclaim, and OOM, all illustrated with code snippets and diagrams.

LinuxMemory Managementallocation

0 likes · 68 min read

How Linux Allocates Physical Memory: Inside the Kernel’s Buddy Allocator

Bin's Tech Cabin

Nov 21, 2022 · Fundamentals

Inside Linux Physical Memory Management: From FLATMEM to NUMA, Watermarks, and Page Structures

This article provides an in‑depth, step‑by‑step explanation of how the Linux kernel organizes and manages physical memory, covering memory models (FLATMEM, DISCONTIGMEM, SPARSEMEM), NUMA vs. UMA architectures, zone partitioning, watermarks, reserved pages, hot‑cold page handling, and the detailed struct page layout used for both anonymous and file‑backed pages.

LinuxMemory Managementnuma

0 likes · 99 min read

Inside Linux Physical Memory Management: From FLATMEM to NUMA, Watermarks, and Page Structures

Architects' Tech Alliance

Sep 21, 2022 · Backend Development

DPDK Technical Overview, Architecture, and Performance Optimization Guide

This article provides a comprehensive technical overview of DPDK, covering its architecture, core libraries, platform modules, polling and CPU‑affinity techniques, huge‑page memory management, NUMA considerations, OS tuning steps, and integration with OVS for high‑performance packet processing.

CPU affinityDPDKHugePages

0 likes · 18 min read

DPDK Technical Overview, Architecture, and Performance Optimization Guide

Architects' Tech Alliance

Aug 22, 2022 · Fundamentals

DPDK Performance Tuning: Influencing Factors and Optimization Techniques

This article explains how hardware architecture, Linux OS version, kernel configuration, OVS integration, memory management, NUMA awareness, and CPU micro‑architecture affect DPDK application performance and provides concrete tuning steps such as CPU isolation, service disabling, huge‑page setup, and optimized memory allocation.

CPU optimizationDPDKLinux

0 likes · 11 min read

DPDK Performance Tuning: Influencing Factors and Optimization Techniques

MaGe Linux Operations

Jul 3, 2022 · Fundamentals

Understanding SMP, NUMA, and MPP: Which Server Architecture Fits Your Needs?

This article explains the three main commercial server architectures—SMP, NUMA, and MPP—detailing their structures, performance characteristics, scalability limits, and suitability for OLTP versus data‑warehouse workloads, while also covering practical considerations such as virtualization and real‑world examples.

MPPSMPServer Architecture

0 likes · 16 min read

Understanding SMP, NUMA, and MPP: Which Server Architecture Fits Your Needs?

Liangxu Linux

May 29, 2022 · Operations

Why Linux Triggers OOM Killer and How to Manage Memory Reclamation

This article explains Linux virtual memory, the page‑fault allocation process, the two memory‑reclaim paths (kswapd and direct reclaim), OOM killer scoring, swappiness tuning, NUMA‑aware reclamation, and practical steps to protect critical processes from being killed.

LinuxOOM killerPage Fault

0 likes · 19 min read

Why Linux Triggers OOM Killer and How to Manage Memory Reclamation

IT Services Circle

May 24, 2022 · Fundamentals

Understanding Linux Memory Management, Page Reclamation, and OOM Killer

This article explains Linux virtual memory concepts, the process of memory allocation, page fault handling, background and direct memory reclamation methods, LRU-based page types, NUMA considerations, tuning parameters like swappiness and min_free_kbytes, and strategies to prevent OOM killer termination.

LinuxMemory ManagementOOM killer

0 likes · 18 min read

Understanding Linux Memory Management, Page Reclamation, and OOM Killer

NetEase Cloud Music Tech Team

May 19, 2022 · Artificial Intelligence

Performance Evaluation of Cloud Music Online Estimation System on NUMA Architecture

Evaluating the Cloud Music online estimation system on NUMA‑based servers revealed that CPU pinning across both memory nodes dramatically boosts throughput on high‑end 96‑core machines—up to 75% for complex models—while low‑end servers gain only modestly, confirming NUMA‑aware scheduling’s critical role for CPU‑intensive inference workloads.

CPU architecturenumaonline inference

0 likes · 8 min read

Performance Evaluation of Cloud Music Online Estimation System on NUMA Architecture

Alibaba Cloud Native

Feb 14, 2022 · Cloud Native

How to Overcome CPU Throttling and NUMA Bottlenecks in Cloud‑Native Containers

This article explains why container workloads suffer from CPU throttling and NUMA‑related performance loss in cloud‑native environments, examines Kubelet's CPU allocation policies, demonstrates the impact of CPU bursts and topology‑aware scheduling, and shows how Alibaba Cloud ACK mitigates these issues with concrete data.

Alibaba Cloud ACKCPU BurstCPU throttling

0 likes · 11 min read

How to Overcome CPU Throttling and NUMA Bottlenecks in Cloud‑Native Containers

360 Tech Engineering

Sep 2, 2021 · Cloud Computing

Performance Comparison and CPU Pinning Techniques for Enterprise‑Level Virtual Machine Instances

The article analyzes the instability of shared‑type virtual machines, introduces enterprise‑level instances with fixed CPU scheduling and NUMA topology, details the applied technologies such as CPU pinning, PCI‑Passthrough and multi‑queue NICs, and presents extensive sysbench and STREAM benchmark results that demonstrate superior isolation, stability and performance of enterprise instances over shared ones.

CPU pinningCloud ComputingSysbench

0 likes · 12 min read

Performance Comparison and CPU Pinning Techniques for Enterprise‑Level Virtual Machine Instances

Ops Development Stories

Jul 15, 2021 · Operations

Mastering NUMA and Hyper-Threading: Boost CPU Cache Hits and Reduce Latency

This article explains NUMA architecture with hyper‑threading, details CPU cache hierarchies and access latencies, and provides Linux tools and practical optimization techniques to improve cache‑hit rates and minimize cross‑NUMA memory delays.

CPU cacheHyper-threadingLinux

0 likes · 9 min read

Mastering NUMA and Hyper-Threading: Boost CPU Cache Hits and Reduce Latency

Ops Development Stories

Jul 14, 2021 · Operations

Mastering NUMA on Linux: Optimize Memory Allocation with numactl

This guide explains NUMA memory hierarchy, shows how to install and use the numactl command, interprets hardware and NUMA statistics, and presents memory allocation strategies to improve performance on multi‑node Linux systems.

Linuxnumanumactl

0 likes · 9 min read

Mastering NUMA on Linux: Optimize Memory Allocation with numactl

Ops Development Stories

Jul 13, 2021 · Fundamentals

Understanding Multi-Core Processor Architectures: SMP, UMA, NUMA & Cache Hierarchies

This article outlines the main server hardware architectures—SMP, UMA, and NUMA—explains shared-storage models, details multi-core cache structures from private L1 to shared L3, compares access latencies, and discusses inter-core communication mechanisms and cache coherency protocols.

CacheSMParchitecture

0 likes · 14 min read

Understanding Multi-Core Processor Architectures: SMP, UMA, NUMA & Cache Hierarchies

Liangxu Linux

Jun 5, 2021 · Fundamentals

Inside Linux 5.4 Scheduler: How the Kernel Initializes CFS and Multi‑Core Scheduling

This article explains the Linux kernel scheduler's core concepts, walks through the initialization code of the 5.4 kernel (including run queues, scheduling classes, domains, and groups), and details how multi‑core and NUMA topologies are handled to achieve balanced CPU usage.

CFSLinuxScheduler

0 likes · 23 min read

Inside Linux 5.4 Scheduler: How the Kernel Initializes CFS and Multi‑Core Scheduling

Liangxu Linux

Jun 2, 2021 · Operations

Mastering Linux Multi‑Core Scheduling: Strategies, Algorithms, and Performance Optimizations

This article explains Linux's sophisticated scheduling system for multi‑core, SMP, and NUMA architectures, describes global, clustered, partitioned, and arbitrary schedulers, details scheduling domains and load‑balancing mechanisms, and provides practical performance‑tuning techniques using tools like perf, flame graphs, and various kernel optimizations.

BFSCFSCPU optimization

0 likes · 31 min read

Mastering Linux Multi‑Core Scheduling: Strategies, Algorithms, and Performance Optimizations

360 Smart Cloud

Jun 1, 2021 · Fundamentals

Physical Address Space Management and Memory Allocation in Linux (NUMA, Nodes, Zones, Pages, Slab, and Page Fault Handling)

This article explains how Linux manages physical address space using SMP and NUMA architectures, describes the node, zone, and page data structures, details page allocation via the buddy system and slab allocator, and outlines user‑ and kernel‑mode page‑fault handling, swapping, and address translation mechanisms.

LinuxMemory ManagementPage Fault

0 likes · 17 min read

Physical Address Space Management and Memory Allocation in Linux (NUMA, Nodes, Zones, Pages, Slab, and Page Fault Handling)

Aikesheng Open Source Community

Apr 22, 2021 · Databases

Understanding NUMA and Its Impact on MySQL Performance

This article explains NUMA architecture, how its memory allocation policies can cause swap‑related performance issues for MySQL, provides step‑by‑step methods to disable NUMA at BIOS, kernel or MySQL levels, and discusses the innodb_numa_interleave parameter and best‑practice recommendations.

Linuxdatabaseinnodb_numa_interleave

0 likes · 7 min read

Understanding NUMA and Its Impact on MySQL Performance

Architects' Tech Alliance

Apr 11, 2021 · Industry Insights

How to Supercharge Ceph on Huawei Kunpeng ARM: Deep Performance Tuning Guide

This article examines Ceph’s architecture, identifies performance bottlenecks on Huawei’s Kunpeng ARM platform, and presents practical tuning methods—including NUMA placement, cache tagging, vector acceleration, thread scaling, and monitoring tools—to improve storage efficiency, reduce latency, and lower power consumption.

ArmCache OptimizationCeph

0 likes · 17 min read

How to Supercharge Ceph on Huawei Kunpeng ARM: Deep Performance Tuning Guide

ITPUB

Jan 25, 2021 · Fundamentals

Understanding Linux Kernel Memory: Nodes, Zones, Buddy System, and SLAB Allocator

This article explains how Linux 3.10 organizes memory using NUMA nodes, zones, the buddy system, and the SLAB allocator, providing commands, code examples, and visual diagrams to illustrate each layer of the kernel's efficient memory management.

LinuxMemory ManagementSlab Allocator

0 likes · 11 min read

Understanding Linux Kernel Memory: Nodes, Zones, Buddy System, and SLAB Allocator

Refining Core Development Skills

Jan 25, 2021 · Fundamentals

Understanding Linux Memory Management: Nodes, Zones, Buddy System, and SLAB Allocator

This article explains the Linux kernel memory management hierarchy—including NUMA nodes, memory zones, the buddy system for free pages, and the SLAB allocator—providing command‑line examples, code snippets, and visual diagrams to illustrate how the kernel efficiently allocates and reclaims memory.

LinuxMemory ManagementSlab Allocator

0 likes · 11 min read

Understanding Linux Memory Management: Nodes, Zones, Buddy System, and SLAB Allocator

ITPUB

Dec 16, 2020 · Databases

Can a Single Redis Instance Safely Use 50 GB on a 64 GB Machine? A NUMA Trap Experiment

This article investigates how large a single Redis instance can be on a 64 GB server, explains the NUMA memory‑allocation trap, and documents experiments that compare unbound versus CPU‑memory‑affinity‑bound deployments, revealing when swap and OOM occur.

LinuxMemory Managementnuma

0 likes · 11 min read

Can a Single Redis Instance Safely Use 50 GB on a 64 GB Machine? A NUMA Trap Experiment

Architects' Tech Alliance

Nov 11, 2020 · Fundamentals

Understanding DPDK Memory Management: Large Pages, NUMA, DMA, and IOMMU

This article explains the core principles of DPDK memory management, covering standard huge pages, NUMA node binding, direct memory access, IOMMU and IOVA addressing, custom allocators, and memory pools, and how these mechanisms together enable high‑performance packet processing on Linux systems.

DMADPDKHuge Pages

0 likes · 14 min read

Understanding DPDK Memory Management: Large Pages, NUMA, DMA, and IOMMU

Full-Stack Internet Architecture

Sep 9, 2020 · Fundamentals

Server Hardware Basics: Servers, CPUs, Memory, Disk, and Network Cards

This article provides a comprehensive overview of server hardware fundamentals, covering server form factors, motherboard architecture, selection criteria, major vendors, CPU concepts and performance, NUMA, memory specifications, disk throughput and IOPS, as well as network card types and bonding modes.

CPUHardware SelectionNetwork Card

0 likes · 13 min read

Server Hardware Basics: Servers, CPUs, Memory, Disk, and Network Cards

ITPUB

May 10, 2020 · Databases

How We Migrated MySQL to Tencent Cloud CDB and Boosted Performance Up to 10×

This case study details the migration of Weimeng's MySQL databases to Tencent Cloud CDB, describing the testing methodology, performance bottlenecks discovered (NUMA, network parameters, low‑concurrency issues, and version bugs), the step‑by‑step optimizations applied, and the resulting QPS improvements across various workloads.

Performance TuningTencent Cloud CDBdatabase migration

0 likes · 20 min read

How We Migrated MySQL to Tencent Cloud CDB and Boosted Performance Up to 10×

Refining Core Development Skills

Dec 30, 2019 · Fundamentals

Unlocking Memory Secrets: Why Random IO Slows Down and How to Optimize It

This article explores the physical structure of RAM, compares random and sequential memory I/O performance, examines real‑world bandwidth versus advertised specs, delves into NUMA latency differences, and shows practical optimization techniques for PHP7 and Redis based on deep hardware and kernel knowledge.

IO performanceLinux kernelhardware fundamentals

0 likes · 8 min read

Unlocking Memory Secrets: Why Random IO Slows Down and How to Optimize It

Refining Core Development Skills

Dec 17, 2019 · Databases

Investigating the NUMA Trap with a Large Redis Instance on a Dual‑Node Server

This article documents a hands‑on experiment that allocates a 50 GB Redis instance on a 64 GB dual‑node machine, explores NUMA behavior, demonstrates how memory affinity can trigger swap, and concludes with practical recommendations for Redis memory sizing and NUMA binding.

LinuxMemory ManagementRedis

0 likes · 9 min read

Investigating the NUMA Trap with a Large Redis Instance on a Dual‑Node Server

Ctrip Technology

Nov 21, 2019 · Cloud Native

Case Study: Intermittent Container Timeout Issues – Analysis and Resolution

This article presents a detailed case study of intermittent container timeout problems in a Kubernetes environment, examining kernel upgrades, NUMA configurations, CPU affinity bindings, kubelet behavior, cadvisor overhead, and hardware faults, and outlines the investigative steps and solutions applied.

CPU affinityHardware FaultOperations

0 likes · 8 min read

Case Study: Intermittent Container Timeout Issues – Analysis and Resolution

Huawei Cloud Developer Alliance

Nov 8, 2019 · Operations

Boost Network Performance on Kunpeng CPUs: Tuning Tips & Tools

This guide explains how to improve network subsystem performance on Kunpeng processors by using tools such as ethtool and strace, adjusting PCIe payload size, binding NIC interrupts to NUMA‑local cores, tweaking interrupt coalescing, enabling TSO, and replacing select with epoll for high‑concurrency workloads.

KunpengTSOepoll

0 likes · 12 min read

Boost Network Performance on Kunpeng CPUs: Tuning Tips & Tools

Refining Core Development Skills

Nov 7, 2019 · Fundamentals

Understanding CPU‑Memory Interconnects: From FSB to NUMA and Practical Linux Tests

The article explains how modern servers with multiple CPUs and memory modules connect via legacy Front Side Bus and modern NUMA architectures, demonstrates Linux commands to inspect memory topology, and presents benchmark results showing latency differences between intra‑node and inter‑node memory accesses.

CPULinuxmemory

0 likes · 7 min read

Understanding CPU‑Memory Interconnects: From FSB to NUMA and Practical Linux Tests

Huawei Cloud Developer Alliance

Oct 30, 2019 · Operations

Master CPU & Memory Subsystem Tuning on Kunpeng Processors: Tools & Strategies

This article introduces practical CPU and memory subsystem performance tuning for Kunpeng processors, covering optimization concepts, key parameters, common monitoring tools such as top, perf and numactl, and detailed methods like NUMA binding, prefetch control, timer tuning, TLB page size adjustment, and thread concurrency optimization.

CPU tuningKunpengLinux performance

0 likes · 15 min read

Master CPU & Memory Subsystem Tuning on Kunpeng Processors: Tools & Strategies

Huawei Cloud Developer Alliance

Oct 24, 2019 · Fundamentals

Boost Kunpeng CPUs: NUMA Basics and a 5‑Step Performance Tuning Guide

This article introduces the Kunpeng processor’s NUMA architecture, contrasts it with traditional SMP designs, and presents a practical five‑step methodology for performance optimization, helping developers on Kunpeng platforms achieve better scalability and efficiency through targeted memory‑access tuning.

KunpengPerformance Tuningmulti-core

0 likes · 5 min read

Boost Kunpeng CPUs: NUMA Basics and a 5‑Step Performance Tuning Guide

dbaplus Community

May 8, 2019 · Databases

How False Sharing in CPU Cache Slowed Down Our Database Table Scans—and How We Fixed It

During a client’s upgrade test, a database’s compressed tables exhibited severe slowdown under concurrent full‑table scans, which we traced to CPU cache line false sharing in the decompression code; using Linux perf tools we identified the hotspot, aligned memory, and restored performance.

CPU cacheDatabase PerformanceLinux perf

0 likes · 13 min read

How False Sharing in CPU Cache Slowed Down Our Database Table Scans—and How We Fixed It

Programmer DD

Feb 12, 2019 · Fundamentals

How ZGC Achieves Sub‑10 ms Pauses: A Deep Dive into Java’s Low‑Latency GC

ZGC is a scalable, low‑latency Java garbage collector designed to keep pause times under 10 ms regardless of heap size, supporting up to 4 TB, and leveraging concurrent, region‑based, compacting, NUMA‑aware techniques, colored pointers, and load barriers, with detailed compilation and tuning guidance.

Garbage CollectionJDKJava

0 likes · 8 min read

How ZGC Achieves Sub‑10 ms Pauses: A Deep Dive into Java’s Low‑Latency GC

JD Tech

Dec 11, 2018 · Big Data

Introduction to Graph Computing and the JoyGraph System

This article introduces graph computing, compares it with graph databases, surveys notable graph processing systems, and details the architecture, NUMA‑aware design, execution model, push/pull dual mode, and load‑balancing strategies of the JoyGraph framework while outlining its future development directions.

Big DataJoyGraphnuma

0 likes · 9 min read

Introduction to Graph Computing and the JoyGraph System

dbaplus Community

May 15, 2018 · Operations

Why High‑Throughput Redis Still Drops Packets: Deep Dive into Linux Network Stack and Interrupt Optimization

The article investigates massive packet loss in Meituan‑Dianping's Redis service despite 10 Gbps NIC upgrades, traces the issue to kernel receive‑buffer drops and single‑CPU interrupt handling, and presents a step‑by‑step optimization using backlog tuning, CPU and Redis affinity, and NUMA‑aware placement to eliminate drops and improve latency.

InterruptsLinuxNetwork

0 likes · 30 min read

Why High‑Throughput Redis Still Drops Packets: Deep Dive into Linux Network Stack and Interrupt Optimization

Architects' Tech Alliance

Apr 12, 2018 · Fundamentals

Understanding MPI, OpenMPI, OpenMP and the Differences Between SMP, NUMA, and MPP Architectures

This article explains the concepts of MPI, OpenMPI, and OpenMP, compares three major server architectures—SMP, NUMA, and MPP—and discusses their performance characteristics, scalability limits, and typical application scenarios in high‑performance computing.

HPCMPIMPP

0 likes · 13 min read

Understanding MPI, OpenMPI, OpenMP and the Differences Between SMP, NUMA, and MPP Architectures

Qunar Tech Salon

Mar 21, 2018 · Operations

Root Cause Analysis and Optimization of Network Packet Loss in High‑Traffic Redis Services

The article investigates why massive Redis deployments experience network packet loss despite using 10 Gbps NICs, explains how Linux kernel counters such as net.if.in.dropped are derived from /proc/net/dev, walks through the driver‑to‑kernel processing path, and proposes CPU‑affinity, interrupt‑affinity and NUMA‑aware tuning to eliminate the drops.

CPU affinityLinux kernelPacket loss

0 likes · 28 min read

Root Cause Analysis and Optimization of Network Packet Loss in High‑Traffic Redis Services

dbaplus Community

Jul 6, 2016 · Fundamentals

When Huge Pages Hurt Performance: Risks and Best Practices on NUMA Systems

This article explains the origins and mechanics of Huge Pages, why they are not a universal solution, how they can degrade performance on NUMA architectures, and provides practical testing methods and mitigation strategies for developers and system administrators.

Huge PagesMemory ManagementPerformance Optimization

0 likes · 12 min read

When Huge Pages Hurt Performance: Risks and Best Practices on NUMA Systems

Alibaba Cloud Infrastructure

Jul 8, 2015 · Operations

Design and Implementation of High‑Concurrency (C10M) Load Balancing in Alibaba's AGW Middlebox

The article analyzes the challenges of scaling network devices to handle ten‑million concurrent connections (C10M) and describes Alibaba's AGW solution, which uses lock‑free data planes, hugepages, NUMA‑aware memory placement, and user‑space NIC drivers to achieve high‑performance four‑layer load balancing.

C10Mhugepageload balancing

0 likes · 9 min read

Design and Implementation of High‑Concurrency (C10M) Load Balancing in Alibaba's AGW Middlebox