Operations 27 min read

How to Optimize AMD Milan Server Performance: BIOS, Memory, and Power Tuning

This article provides a detailed, data‑driven guide to evaluating and tuning AMD Milan‑based servers, covering SPEC CPU benchmarking, BIOS options such as SMT and Boost, memory channel and NUMA configurations, interleaving, IOMMU, and power‑state settings to achieve up to 30% performance gains.

Bilibili Tech
Bilibili Tech
Bilibili Tech
How to Optimize AMD Milan Server Performance: BIOS, Memory, and Power Tuning

Background

Bilibili's system team evaluates server hardware performance using a single‑socket AMD Milan CPU platform, focusing on hardware‑level optimizations and benchmark testing to guide iterative performance improvements.

Benchmark Tools

The team uses SPEC CPU 2017, a widely accepted CPU‑intensive benchmark suite, to measure integer (SPECrate®2017 Integer, SPECspeed®2017 Integer) and floating‑point (SPECrate®2017 Floating Point, SPECspeed®2017 Floating Point) performance. SPEC CPU isolates CPU, memory, and compiler effects, making results comparable across configurations.

BIOS Tuning

CPU‑Related Settings

Key BIOS options include:

SMT (Simultaneous Multithreading) – enables two hardware threads per core, increasing logical cores from 64 to 128.

Core Performance Boost (CPB) – allows dynamic frequency scaling under load.

Testing on an AMD 64‑Core processor with 256 GB DDR4‑3200 memory showed:

Enabling SMT improves integer throughput by ~10% with little impact on floating‑point.

Enabling Boost adds ~15% performance; combining SMT and Boost yields ~30% overall gain.

Specific BIOS Settings

SMT Control = Auto (or Enabled/Disabled as needed).

Core Performance Boost = Auto (Enabled for most workloads).

Note: Milan CPUs have higher performance per watt but lack full‑core stable overclocking, affecting the maximum boost potential.

Memory and I/O Optimization

Memory Channels and Capacity

AMD Milan supports up to 8 memory channels. Using 8‑channel configurations (e.g., 16 GB × 8) provides significantly higher bandwidth than 4‑channel setups. Tests with the STREAM benchmark showed minimal impact from BIOS CPU settings but clear differences across channel counts.

NUMA (Non‑Uniform Memory Access)

Milan CPUs expose up to 4 NUMA nodes per socket (NPS4). While more NUMA nodes can improve locality, insufficient memory per node may degrade performance. SPEC CPU tests across NPS1, NPS2, and NPS4 showed modest gains, emphasizing workload‑specific tuning.

Memory Interleaving

Enabling memory interleaving distributes consecutive memory blocks across channels, increasing bandwidth and reducing latency. Tests demonstrated that disabling interleaving roughly halves memory performance and reduces overall compute throughput by ~30% in NPS1 scenarios.

IOMMU

IOMMU improves device address translation and security. Enabling it can slightly reduce raw compute performance due to translation overhead, but is essential for virtualization and high‑PPS network workloads.

Power and Power‑Management Tuning

C‑states and P‑states

C‑states define idle power levels (C0‑active, CC1, CC6 deep sleep). Keeping CPUs in C0/C1 minimizes wake‑up latency. P‑states (P0‑P2) control active frequency and voltage; P0 offers maximum performance.

cTDP and Package Power Limit (PPL)

Configurable TDP (cTDP) lets administrators raise or lower the thermal design power envelope. Raising cTDP and PPL can unlock additional performance at the cost of higher power draw.

Determinism Slider

This BIOS option selects between a Performance mode (stable, recommended) and a Power mode (potentially higher peak performance). The slider itself does not auto‑adjust based on workload.

cpupower Utility

Linux cpupower commands can query and set CPU frequency policies: cpupower -c all frequency-info – display per‑core frequency details. cpupower frequency-set -g performance – force performance governor. cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor – verify current governor.

Test Data and Conclusions

Across all sections, SPEC CPU and STREAM benchmarks consistently showed:

Higher memory channel counts (8 > 6 > 4 > 2) deliver 30%+ CPU gains and up to double memory bandwidth.

Larger per‑channel capacity (32 GB × 8 beats 16 GB × 16) yields modest improvements.

NUMA benefits are workload‑dependent; even a 1% overall compute gain can be valuable at scale.

Disabling memory interleaving cuts memory performance roughly in half and reduces compute throughput by ~30%.

Power‑related settings (P‑states, cTDP, Determinism Slider, cpupower) have noticeable impact; keeping CPUs in Performance mode and enabling Boost and SMT provides the best baseline.

Specific Configuration Recommendations

SMT Control = Auto (or Enabled/Disabled per workload).

Core Performance Boost = Auto (Enabled).

NUMA nodes per socket = NPS4 (or NPS1/2 as needed).

ACPI SRAT L3 Cache As NUMA Domain = Enable.

Memory Interleaving = Auto (or Disabled to test impact).

IOMMU = Auto (kernel parameter iommu=pt).

C‑states: disable CC6, keep C0/C1.

P‑states: set to P0 for maximum performance.

cTDP and PPL: raise to CPU‑supported maximum.

Determinism Slider: set to Performance (or Power for extreme cases).

Summary

Server performance tuning on AMD Milan platforms involves a systematic approach: select appropriate benchmark tools, adjust BIOS settings (SMT, Boost, C‑/P‑states, cTDP), optimize memory layout (channel count, NUMA, interleaving), and configure power management options. Data‑driven testing shows that modest BIOS tweaks can yield up to 30% compute improvement, while memory and power settings further influence overall efficiency.

BIOS overview
BIOS overview
SMT vs Boost results
SMT vs Boost results
Memory channel performance
Memory channel performance
NUMA and interleaving impact
NUMA and interleaving impact
C‑states diagram
C‑states diagram
Power configuration performance
Power configuration performance
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Memory Optimizationpower managementserver performanceSPEC CPUAMD MilanBIOS tuning
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.