Databases 12 min read

Performance Tuning of JED Database on Huawei Kunpeng ARM vs Intel X86 Platforms

This technical report details the background, hardware configuration, database setup, tuning results, and step‑by‑step optimization procedures—including BIOS, OS, network, container NUMA binding, MySQL CRC32 patching, and Go PGO tuning—performed to improve JED performance on ARM compared with Intel.

JD Retail Technology
JD Retail Technology
JD Retail Technology
Performance Tuning of JED Database on Huawei Kunpeng ARM vs Intel X86 Platforms

Project Background

In response to national initiatives promoting independent technology, the project replaces foreign components with domestic ones, starting with databases. JED is deployed on a Huawei Kunpeng ARM server and compared with an Intel X86 server to evaluate performance after tuning.

Physical Machine Configuration

Processor Vendor

Architecture

CPU Model

CPU

Turbo

Memory Frequency

OS

Huawei

ARM

kunpeng920-7262C

128C

None

3200MT/s

Euler

Intel

X86

platium-8338C-3rd

128C

Enabled

3200MT/s

CentOS 8

Database Configuration

Deployment Site

Langfang

Deployment Method

Container

Gateway Config

16C/12G Disk:/export:30G

DB Architecture

1 cluster, primary‑secondary

DB Resources

8C/24G Disk:/export:512G

Optimization Results

Before tuning, under 50% background load, JED on Kunpeng achieved 58% of Intel's read performance and 68% of its write performance. After tuning, read performance reached 99% of Intel, write performance 121%, and mixed read/write (7:3) hit 113%, with TP99 and response times improved while CPU usage stayed at 100%.

Specific Tuning Steps

BIOS Optimization

Requires data‑center modification and host reboot.

Expected changes: disable CPU prefetching, set Power Policy to Performance, keep SMMU enabled.

Host OS Optimization

Disable firewall (already disabled in production):

systemctl status firewalld.service</code><code>systemctl stop firewalld.service</code><code>systemctl disable firewalld.service</code><code>systemctl status firewalld.service

Network kernel parameters (no noticeable gain, left unchanged):

echo 1024 >/proc/sys/net/core/somaxconn</code><code>echo 16777216 >/proc/sys/net/core/rmem_max</code><code>echo 16777216 >/proc/sys/net/core/wmem_max</code><code>echo "4096 87380 16777216" >/proc/sys/net/ipv4/tcp_rmem</code><code>echo "4096 65536 16777216" >/proc/sys/net/ipv4/tcp_wmem</code><code>echo 360000 >/proc/sys/net/ipv4/tcp_max_syn_backlog

IO Scheduler Optimization

echo deadline > /sys/block/nvme0n1/queue/scheduler</code><code>echo deadline > /sys/block/nvme1n1/queue/scheduler</code><code>echo deadline > /sys/block/nvme2n1/queue/scheduler</code><code>echo deadline > /sys/block/nvme3n1/queue/scheduler</code><code>echo deadline > /sys/block/sda/queue/scheduler</code><code>echo 2048 > /sys/block/nvme0n1/queue/nr_requests</code><code>echo 2048 > /sys/block/nvme1n1/queue/nr_requests</code><code>echo 2048 > /sys/block/nvme2n1/queue/nr_requests</code><code>echo 2048 > /sys/block/nvme3n1/queue/nr_requests</code><code>echo 2048 > /sys/block/sda/queue/nr_requests

Cache Parameter Optimization

echo 5 >/proc/sys/vm/dirty_ratio</code><code>echo 1 >/proc/sys/vm/swappiness

Network Card IRQ Binding

Adjust ethX queue count and bind IRQs to CPU cores (example for eth0):

ethtool -l eth0</code><code>ethtool -L eth0 combined 8</code><code>systemctl stop irqbalance</code><code>systemctl disable irqbalance</code><code>for i in $(cat /proc/interrupts | grep $(ethtool -i eth0 | grep -i bus-info | awk -F ': ' '{print $2}') | awk -F ':' '{print $1}'); do echo 31 > /proc/irq/$i/smp_affinity_list; done

Business Container NUMA Binding

Modify container cgroup to bind CPUs and memory to a specific NUMA node before deployment:

# Enter container cgroup directory</code><code>cd /sys/fs/cgroup/cpuset/kubepods/burstable/podXXXXXXXX/7b40a68aXXXXXXXX</code><code># Stop Docker (restart resets cgroup)</code><code>systemctl stop docker</code><code># Set CPU and memory sets</code><code>echo 16-23 > cpu.set</code><code>echo 0 > mem.set

MySQL CRC32 Soft‑to‑Hard Compilation for ARM

cd /mysql-5.7.26</code><code>git apply crc32-mysql5.7.26.patch

Go Version Upgrade and PGO Optimization

Upgrade Go to 1.21 and enable PGO profiling:

import _ "net/http/pprof"</code><code># Run the program under load and collect profile</code><code>curl -o cpu.pprof http://localhost:8080/debug/pprof/profile?seconds=304</code><code>mv cpu.pprof default.pgo</code><code>go build -pgo=auto

Benchmark results from the Go team show a 2‑7% performance gain for representative programs when built with PGO.

Conclusion

The comprehensive tuning—covering BIOS, OS, network, container, MySQL compilation, and Go application optimization—significantly narrows the performance gap between ARM‑based Kunpeng servers and Intel X86 servers for the JED database workload.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

optimizationdatabasekernelmysqlARMPGO
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.