How to Optimize AMD Milan Server Performance: BIOS, Memory, and Power Tuning
This article provides a detailed, data‑driven guide to evaluating and tuning AMD Milan‑based servers, covering SPEC CPU benchmarking, BIOS options such as SMT and Boost, memory channel and NUMA configurations, interleaving, IOMMU, and power‑state settings to achieve up to 30% performance gains.
Background
Bilibili's system team evaluates server hardware performance using a single‑socket AMD Milan CPU platform, focusing on hardware‑level optimizations and benchmark testing to guide iterative performance improvements.
Benchmark Tools
The team uses SPEC CPU 2017, a widely accepted CPU‑intensive benchmark suite, to measure integer (SPECrate®2017 Integer, SPECspeed®2017 Integer) and floating‑point (SPECrate®2017 Floating Point, SPECspeed®2017 Floating Point) performance. SPEC CPU isolates CPU, memory, and compiler effects, making results comparable across configurations.
BIOS Tuning
CPU‑Related Settings
Key BIOS options include:
SMT (Simultaneous Multithreading) – enables two hardware threads per core, increasing logical cores from 64 to 128.
Core Performance Boost (CPB) – allows dynamic frequency scaling under load.
Testing on an AMD 64‑Core processor with 256 GB DDR4‑3200 memory showed:
Enabling SMT improves integer throughput by ~10% with little impact on floating‑point.
Enabling Boost adds ~15% performance; combining SMT and Boost yields ~30% overall gain.
Specific BIOS Settings
SMT Control = Auto (or Enabled/Disabled as needed).
Core Performance Boost = Auto (Enabled for most workloads).
Note: Milan CPUs have higher performance per watt but lack full‑core stable overclocking, affecting the maximum boost potential.
Memory and I/O Optimization
Memory Channels and Capacity
AMD Milan supports up to 8 memory channels. Using 8‑channel configurations (e.g., 16 GB × 8) provides significantly higher bandwidth than 4‑channel setups. Tests with the STREAM benchmark showed minimal impact from BIOS CPU settings but clear differences across channel counts.
NUMA (Non‑Uniform Memory Access)
Milan CPUs expose up to 4 NUMA nodes per socket (NPS4). While more NUMA nodes can improve locality, insufficient memory per node may degrade performance. SPEC CPU tests across NPS1, NPS2, and NPS4 showed modest gains, emphasizing workload‑specific tuning.
Memory Interleaving
Enabling memory interleaving distributes consecutive memory blocks across channels, increasing bandwidth and reducing latency. Tests demonstrated that disabling interleaving roughly halves memory performance and reduces overall compute throughput by ~30% in NPS1 scenarios.
IOMMU
IOMMU improves device address translation and security. Enabling it can slightly reduce raw compute performance due to translation overhead, but is essential for virtualization and high‑PPS network workloads.
Power and Power‑Management Tuning
C‑states and P‑states
C‑states define idle power levels (C0‑active, CC1, CC6 deep sleep). Keeping CPUs in C0/C1 minimizes wake‑up latency. P‑states (P0‑P2) control active frequency and voltage; P0 offers maximum performance.
cTDP and Package Power Limit (PPL)
Configurable TDP (cTDP) lets administrators raise or lower the thermal design power envelope. Raising cTDP and PPL can unlock additional performance at the cost of higher power draw.
Determinism Slider
This BIOS option selects between a Performance mode (stable, recommended) and a Power mode (potentially higher peak performance). The slider itself does not auto‑adjust based on workload.
cpupower Utility
Linux cpupower commands can query and set CPU frequency policies: cpupower -c all frequency-info – display per‑core frequency details. cpupower frequency-set -g performance – force performance governor. cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor – verify current governor.
Test Data and Conclusions
Across all sections, SPEC CPU and STREAM benchmarks consistently showed:
Higher memory channel counts (8 > 6 > 4 > 2) deliver 30%+ CPU gains and up to double memory bandwidth.
Larger per‑channel capacity (32 GB × 8 beats 16 GB × 16) yields modest improvements.
NUMA benefits are workload‑dependent; even a 1% overall compute gain can be valuable at scale.
Disabling memory interleaving cuts memory performance roughly in half and reduces compute throughput by ~30%.
Power‑related settings (P‑states, cTDP, Determinism Slider, cpupower) have noticeable impact; keeping CPUs in Performance mode and enabling Boost and SMT provides the best baseline.
Specific Configuration Recommendations
SMT Control = Auto (or Enabled/Disabled per workload).
Core Performance Boost = Auto (Enabled).
NUMA nodes per socket = NPS4 (or NPS1/2 as needed).
ACPI SRAT L3 Cache As NUMA Domain = Enable.
Memory Interleaving = Auto (or Disabled to test impact).
IOMMU = Auto (kernel parameter iommu=pt).
C‑states: disable CC6, keep C0/C1.
P‑states: set to P0 for maximum performance.
cTDP and PPL: raise to CPU‑supported maximum.
Determinism Slider: set to Performance (or Power for extreme cases).
Summary
Server performance tuning on AMD Milan platforms involves a systematic approach: select appropriate benchmark tools, adjust BIOS settings (SMT, Boost, C‑/P‑states, cTDP), optimize memory layout (channel count, NUMA, interleaving), and configure power management options. Data‑driven testing shows that modest BIOS tweaks can yield up to 30% compute improvement, while memory and power settings further influence overall efficiency.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
