Analyzing and Optimizing High Memory and Disk I/O Consumption of InfluxDB 1.8 on a Production Server
This article investigates why an InfluxDB 1.8 instance on a 32‑core, 64 GB server consumes over 58 GB of resident memory and generates heavy disk I/O, examines Go runtime memory accounting, uses system tools such as top, pmap, pprof and iostat for diagnosis, and presents configuration and runtime tweaks that reduce memory pressure and I/O load.
Background
The production server (32‑core, 64 GB) runs InfluxDB 1.8 written in Go and writes about 100 GB of data per day. After a week the process memory usage exceeds 95 % of RAM and occasional swap alerts appear.
Problem Symptoms
Top shows the InfluxDB process (PID 32309) using 58 GB RES, 95.3 % memory usage, and a high I/O wait (wa 43.1%). The questions are why the process memory is so high and why disk I/O is saturated.
Memory High‑Consumption Analysis
Using the InfluxDB client to query show stats reveals runtime metrics: HeapIdle ≈ 51 GB, HeapReleased ≈ 44 GB, HeapInUse ≈ 16 GB, giving an effective heap of 23 GB. However the system reports 58 GB RES, suggesting ~35 GB of memory is marked as unused but not returned to the kernel.
Further inspection with pmap -x 32309 and cat /proc/32309/smaps shows a huge anonymous heap region (≈ 66 GB) with a large amount of Private_Dirty memory. GDB backtrace ( bt) does not reveal a leak.
Running go tool pprof -alloc_space shows that the function index/inmem.(*Index).DropSeriesGlobal consumes ~42 TB of cumulative allocation during series deletion, confirming that the in‑memory index causes massive temporary allocations.
The Go runtime releases memory to the kernel using MADV_DONTNEED (Go 1.12+ uses the more efficient MADV_FREE). This means RSS does not drop immediately; it only decreases under memory pressure unless the GODEBUG=madvdontneed=1 flag forces the older behavior.
Disk I/O Consumption Analysis
Using iostat -x 1 3 shows the InfluxDB process writes to device dm‑4 with IOPS ≈ 5 361 /s, read ≈ 27 MB/s, write ≈ 33 MB/s, avgqu‑sz ≈ 3.48, await ≈ 0.47 ms, and %util ≈ 97.5 % – indicating the I/O subsystem is saturated.
Performance Optimizations
1. Set GODEBUG=madvdontneed=1 to force the runtime to release memory more aggressively.
2. Change InfluxDB configuration ( influxdb.conf) to reduce I/O pressure and memory usage:
[data]
# Reduce WAL sync frequency to 1s (asynchronous flush)
wal-fsync-delay = "1s"
# Switch index from in‑memory to TSI1 to avoid high memory during retention deletions
index-version = "tsi1"
# Increase compaction throughput to 64 MB
compact-throughput = "64m"Restart the service with the environment variable:
env GODEBUG=madvdontneed=1 /usr/bin/influxd -config /usr/bin/influxdb.confOnline Verification
After a week of running the tuned instance, memory usage dropped to ~55 % of RAM and disk I/O fell to ~200 IOPS with only 6 % utilization, confirming the problem was mitigated.
References
Understanding virt/res/shr relationships – https://www.orchome.com/298
Server architectures (SMP, NUMA, MPP) – https://cloud.tencent.com/developer/article/1372348
Swap pitfalls – https://blog.huoding.com/2012/11/08/198
Go scheduler – https://draveness.me/golang/docs/part3-runtime/ch06-concurrency/golang-goroutine/
NUMA‑aware scheduler for Go – https://docs.google.com/document/u/0/d/1d3iI2QWURgDIsSR6G2275vMeQ_X7w-qxM2Vp7iGwwuM/pub
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
