Databases 12 min read

Performance Tuning of JED Database on Huawei Kunpeng ARM vs Intel X86 Platforms

This technical report details the background, hardware configuration, database setup, tuning results, and step‑by‑step optimization procedures—including BIOS, OS, network, container NUMA binding, MySQL CRC32 patching, and Go PGO tuning—performed to improve JED performance on ARM compared with Intel.

JD Retail Technology
JD Retail Technology
JD Retail Technology
Performance Tuning of JED Database on Huawei Kunpeng ARM vs Intel X86 Platforms

Project Background

In response to national initiatives promoting independent technology, the project replaces foreign components with domestic ones, starting with databases. JED is deployed on a Huawei Kunpeng ARM server and compared with an Intel X86 server to evaluate performance after tuning.

Physical Machine Configuration

Processor Vendor

Architecture

CPU Model

CPU

Turbo

Memory Frequency

OS

Huawei

ARM

kunpeng920-7262C

128C

None

3200MT/s

Euler

Intel

X86

platium-8338C-3rd

128C

Enabled

3200MT/s

CentOS 8

Database Configuration

Deployment Site

Langfang

Deployment Method

Container

Gateway Config

16C/12G Disk:/export:30G

DB Architecture

1 cluster, primary‑secondary

DB Resources

8C/24G Disk:/export:512G

Optimization Results

Before tuning, under 50% background load, JED on Kunpeng achieved 58% of Intel's read performance and 68% of its write performance. After tuning, read performance reached 99% of Intel, write performance 121%, and mixed read/write (7:3) hit 113%, with TP99 and response times improved while CPU usage stayed at 100%.

Specific Tuning Steps

BIOS Optimization

Requires data‑center modification and host reboot.

Expected changes: disable CPU prefetching, set Power Policy to Performance, keep SMMU enabled.

Host OS Optimization

Disable firewall (already disabled in production):

systemctl status firewalld.service
systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl status firewalld.service

Network kernel parameters (no noticeable gain, left unchanged):

echo 1024 >/proc/sys/net/core/somaxconn
echo 16777216 >/proc/sys/net/core/rmem_max
echo 16777216 >/proc/sys/net/core/wmem_max
echo "4096 87380 16777216" >/proc/sys/net/ipv4/tcp_rmem
echo "4096 65536 16777216" >/proc/sys/net/ipv4/tcp_wmem
echo 360000 >/proc/sys/net/ipv4/tcp_max_syn_backlog

IO Scheduler Optimization

echo deadline > /sys/block/nvme0n1/queue/scheduler
echo deadline > /sys/block/nvme1n1/queue/scheduler
echo deadline > /sys/block/nvme2n1/queue/scheduler
echo deadline > /sys/block/nvme3n1/queue/scheduler
echo deadline > /sys/block/sda/queue/scheduler
echo 2048 > /sys/block/nvme0n1/queue/nr_requests
echo 2048 > /sys/block/nvme1n1/queue/nr_requests
echo 2048 > /sys/block/nvme2n1/queue/nr_requests
echo 2048 > /sys/block/nvme3n1/queue/nr_requests
echo 2048 > /sys/block/sda/queue/nr_requests

Cache Parameter Optimization

echo 5 >/proc/sys/vm/dirty_ratio
echo 1 >/proc/sys/vm/swappiness

Network Card IRQ Binding

Adjust ethX queue count and bind IRQs to CPU cores (example for eth0):

ethtool -l eth0
ethtool -L eth0 combined 8
systemctl stop irqbalance
systemctl disable irqbalance
for i in $(cat /proc/interrupts | grep $(ethtool -i eth0 | grep -i bus-info | awk -F ': ' '{print $2}') | awk -F ':' '{print $1}'); do echo 31 > /proc/irq/$i/smp_affinity_list; done

Business Container NUMA Binding

Modify container cgroup to bind CPUs and memory to a specific NUMA node before deployment:

# Enter container cgroup directory
cd /sys/fs/cgroup/cpuset/kubepods/burstable/podXXXXXXXX/7b40a68aXXXXXXXX
# Stop Docker (restart resets cgroup)
systemctl stop docker
# Set CPU and memory sets
echo 16-23 > cpu.set
echo 0 > mem.set

MySQL CRC32 Soft‑to‑Hard Compilation for ARM

cd /mysql-5.7.26
git apply crc32-mysql5.7.26.patch

Go Version Upgrade and PGO Optimization

Upgrade Go to 1.21 and enable PGO profiling:

import _ "net/http/pprof"
# Run the program under load and collect profile
curl -o cpu.pprof http://localhost:8080/debug/pprof/profile?seconds=304
mv cpu.pprof default.pgo
go build -pgo=auto

Benchmark results from the Go team show a 2‑7% performance gain for representative programs when built with PGO.

Conclusion

The comprehensive tuning—covering BIOS, OS, network, container, MySQL compilation, and Go application optimization—significantly narrows the performance gap between ARM‑based Kunpeng servers and Intel X86 servers for the JED database workload.

PerformanceoptimizationDatabaseKernelMySQLARMPGO
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.