Databases 13 min read

Analyzing and Optimizing High Memory and Disk I/O Consumption of InfluxDB 1.8 on a Production Server

This article investigates why an InfluxDB 1.8 instance on a 32‑core, 64 GB server consumes over 58 GB of resident memory and generates heavy disk I/O, examines Go runtime memory accounting, uses system tools such as top, pmap, pprof and iostat for diagnosis, and presents configuration and runtime tweaks that reduce memory pressure and I/O load.

360 Tech Engineering
360 Tech Engineering
360 Tech Engineering
Analyzing and Optimizing High Memory and Disk I/O Consumption of InfluxDB 1.8 on a Production Server

Background

The production server (32‑core, 64 GB) runs InfluxDB 1.8 written in Go and writes about 100 GB of data per day. After a week the process memory usage exceeds 95 % of RAM and occasional swap alerts appear.

Problem Symptoms

Top shows the InfluxDB process (PID 32309) using 58 GB RES, 95.3 % memory usage, and a high I/O wait (wa 43.1%). The questions are why the process memory is so high and why disk I/O is saturated.

Memory High‑Consumption Analysis

Using the InfluxDB client to query show stats reveals runtime metrics: HeapIdle ≈ 51 GB, HeapReleased ≈ 44 GB, HeapInUse ≈ 16 GB, giving an effective heap of 23 GB. However the system reports 58 GB RES, suggesting ~35 GB of memory is marked as unused but not returned to the kernel.

Further inspection with pmap -x 32309 and cat /proc/32309/smaps shows a huge anonymous heap region (≈ 66 GB) with a large amount of Private_Dirty memory. GDB backtrace ( bt) does not reveal a leak.

Running go tool pprof -alloc_space shows that the function index/inmem.(*Index).DropSeriesGlobal consumes ~42 TB of cumulative allocation during series deletion, confirming that the in‑memory index causes massive temporary allocations.

The Go runtime releases memory to the kernel using MADV_DONTNEED (Go 1.12+ uses the more efficient MADV_FREE). This means RSS does not drop immediately; it only decreases under memory pressure unless the GODEBUG=madvdontneed=1 flag forces the older behavior.

Disk I/O Consumption Analysis

Using iostat -x 1 3 shows the InfluxDB process writes to device dm‑4 with IOPS ≈ 5 361 /s, read ≈ 27 MB/s, write ≈ 33 MB/s, avgqu‑sz ≈ 3.48, await ≈ 0.47 ms, and %util ≈ 97.5 % – indicating the I/O subsystem is saturated.

Performance Optimizations

1. Set GODEBUG=madvdontneed=1 to force the runtime to release memory more aggressively.

2. Change InfluxDB configuration ( influxdb.conf) to reduce I/O pressure and memory usage:

[data]
  # Reduce WAL sync frequency to 1s (asynchronous flush)
  wal-fsync-delay = "1s"

  # Switch index from in‑memory to TSI1 to avoid high memory during retention deletions
  index-version = "tsi1"

  # Increase compaction throughput to 64 MB
  compact-throughput = "64m"

Restart the service with the environment variable:

env GODEBUG=madvdontneed=1 /usr/bin/influxd -config /usr/bin/influxdb.conf

Online Verification

After a week of running the tuned instance, memory usage dropped to ~55 % of RAM and disk I/O fell to ~200 IOPS with only 6 % utilization, confirming the problem was mitigated.

References

Understanding virt/res/shr relationships – https://www.orchome.com/298

Server architectures (SMP, NUMA, MPP) – https://cloud.tencent.com/developer/article/1372348

Swap pitfalls – https://blog.huoding.com/2012/11/08/198

Go scheduler – https://draveness.me/golang/docs/part3-runtime/ch06-concurrency/golang-goroutine/

NUMA‑aware scheduler for Go – https://docs.google.com/document/u/0/d/1d3iI2QWURgDIsSR6G2275vMeQ_X7w-qxM2Vp7iGwwuM/pub

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performance tuninglinuxmemory leakInfluxDBgo runtime
360 Tech Engineering
Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.