Databases 12 min read

Root Cause Analysis and Performance Optimization of InfluxDB 1.8 Memory and Disk I/O on a Production Server

The article investigates why an InfluxDB 1.8 instance on a 32‑core, 64 GB production server consumes over 95% memory and generates heavy disk I/O, analyzes runtime statistics, pprof data, and Go memory‑release behavior, and presents configuration and runtime tweaks that reduce memory usage to ~55% and I/O load to acceptable levels.

360 Smart Cloud

Mar 19, 2021

Root Cause Analysis and Performance Optimization of InfluxDB 1.8 Memory and Disk I/O on a Production Server

After a week of continuous writes (~100 GB per day) to an InfluxDB 1.8 instance on a 32‑core, 64 GB server, the process showed >95% memory consumption and frequent swap alerts. The top output revealed the influxd process using 58 GB RES, with a high I/O wait (wa ≈ 43%).

Memory consumption analysis showed that the Go runtime reported HeapIdle ≈ 51 GB, HeapInUse ≈ 16 GB and HeapReleased ≈ 44 GB, giving an effective heap of about 23 GB, far lower than the 58 GB RES reported by the OS. Detailed inspection with pmap, /proc/…/smaps and gdb confirmed no obvious memory leak.

Using go tool pprof -alloc_space exposed that the DropSeriesGlobal routine in the InfluxDB index consumed >42 TB of cumulative allocation during series deletion, explaining the memory spikes. The Go runtime in versions ≥1.12 uses MADV_FREE to return memory to the kernel, but the RSS only drops when the system is under memory pressure; the GODEBUG=madvdontneed=1 flag forces the older MADV_DONTNEED behavior for faster RSS reduction.

Disk I/O analysis with iostat showed the device dm‑4 handling ~5 361 IOPS, ~27 MiB/s reads and ~33 MiB/s writes, with an average queue length of 3.48 and %util near 98%, indicating the storage subsystem was saturated.

To mitigate the issues, the configuration influxdb.conf was adjusted:

[data]
wal-fsync-delay = "1s"   # reduce WAL sync frequency
index-version = "tsi1"    # switch from in‑memory to TSI index
compact-throughput = "64m" # increase TSM compaction throughput

The server was then started with the environment variable GODEBUG=madvdontneed=1:

env GODEBUG=madvdontneed=1 /usr/bin/influxd -config /usr/bin/influxdb.conf

After a week of operation, monitoring showed memory usage stabilised around 55% of total RAM and disk I/O dropped to ~200 IOPS with %util ≈ 6%, confirming that the optimisations resolved the high‑consumption symptoms.

References include Go runtime memory‑release documentation, InfluxDB index tuning guides, and Linux performance analysis resources.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Tuning memory-leak database optimization InfluxDB disk I/O go runtime

Written by

360 Smart Cloud

Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.