Root Cause Analysis and Performance Optimization of InfluxDB 1.8 Memory and Disk I/O on a Production Server
The article investigates why an InfluxDB 1.8 instance on a 32‑core, 64 GB production server consumes over 95% memory and generates heavy disk I/O, analyzes runtime statistics, pprof data, and Go memory‑release behavior, and presents configuration and runtime tweaks that reduce memory usage to ~55% and I/O load to acceptable levels.
After a week of continuous writes (~100 GB per day) to an InfluxDB 1.8 instance on a 32‑core, 64 GB server, the process showed >95% memory consumption and frequent swap alerts. The top output revealed the influxd process using 58 GB RES, with a high I/O wait (wa ≈ 43%).
Memory consumption analysis showed that the Go runtime reported HeapIdle ≈ 51 GB , HeapInUse ≈ 16 GB and HeapReleased ≈ 44 GB , giving an effective heap of about 23 GB, far lower than the 58 GB RES reported by the OS. Detailed inspection with pmap , /proc/…/smaps and gdb confirmed no obvious memory leak.
Using go tool pprof -alloc_space exposed that the DropSeriesGlobal routine in the InfluxDB index consumed >42 TB of cumulative allocation during series deletion, explaining the memory spikes. The Go runtime in versions ≥1.12 uses MADV_FREE to return memory to the kernel, but the RSS only drops when the system is under memory pressure; the GODEBUG=madvdontneed=1 flag forces the older MADV_DONTNEED behavior for faster RSS reduction.
Disk I/O analysis with iostat showed the device dm‑4 handling ~5 361 IOPS, ~27 MiB/s reads and ~33 MiB/s writes, with an average queue length of 3.48 and %util near 98%, indicating the storage subsystem was saturated.
To mitigate the issues, the configuration influxdb.conf was adjusted:
[data]
wal-fsync-delay = "1s" # reduce WAL sync frequency
index-version = "tsi1" # switch from in‑memory to TSI index
compact-throughput = "64m" # increase TSM compaction throughputThe server was then started with the environment variable GODEBUG=madvdontneed=1 :
env GODEBUG=madvdontneed=1 /usr/bin/influxd -config /usr/bin/influxdb.confAfter a week of operation, monitoring showed memory usage stabilised around 55% of total RAM and disk I/O dropped to ~200 IOPS with %util ≈ 6%, confirming that the optimisations resolved the high‑consumption symptoms.
References include Go runtime memory‑release documentation, InfluxDB index tuning guides, and Linux performance analysis resources.
360 Smart Cloud
Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.