Databases 13 min read

Analyzing and Optimizing High Memory and Disk I/O Consumption of InfluxDB 1.8 on a Production Server

This article investigates why an InfluxDB 1.8 instance on a 32‑core, 64 GB server consumes over 58 GB of resident memory and generates heavy disk I/O, examines Go runtime memory accounting, uses system tools such as top, pmap, pprof and iostat for diagnosis, and presents configuration and runtime tweaks that reduce memory pressure and I/O load.

360 Tech Engineering
360 Tech Engineering
360 Tech Engineering
Analyzing and Optimizing High Memory and Disk I/O Consumption of InfluxDB 1.8 on a Production Server

Background

The production server (32‑core, 64 GB) runs InfluxDB 1.8 written in Go and writes about 100 GB of data per day. After a week the process memory usage exceeds 95 % of RAM and occasional swap alerts appear.

Problem Symptoms

Top shows the InfluxDB process (PID 32309) using 58 GB RES, 95.3 % memory usage, and a high I/O wait (wa 43.1%). The questions are why the process memory is so high and why disk I/O is saturated.

Memory High‑Consumption Analysis

Using the InfluxDB client to query show stats reveals runtime metrics: HeapIdle ≈ 51 GB, HeapReleased ≈ 44 GB, HeapInUse ≈ 16 GB, giving an effective heap of 23 GB. However the system reports 58 GB RES, suggesting ~35 GB of memory is marked as unused but not returned to the kernel.

Further inspection with pmap -x 32309 and cat /proc/32309/smaps shows a huge anonymous heap region (≈ 66 GB) with a large amount of Private_Dirty memory. GDB backtrace ( bt ) does not reveal a leak.

Running go tool pprof -alloc_space shows that the function index/inmem.(*Index).DropSeriesGlobal consumes ~42 TB of cumulative allocation during series deletion, confirming that the in‑memory index causes massive temporary allocations.

The Go runtime releases memory to the kernel using MADV_DONTNEED (Go 1.12+ uses the more efficient MADV_FREE ). This means RSS does not drop immediately; it only decreases under memory pressure unless the GODEBUG=madvdontneed=1 flag forces the older behavior.

Disk I/O Consumption Analysis

Using iostat -x 1 3 shows the InfluxDB process writes to device dm‑4 with IOPS ≈ 5 361 /s, read ≈ 27 MB/s, write ≈ 33 MB/s, avgqu‑sz ≈ 3.48, await ≈ 0.47 ms, and %util ≈ 97.5 % – indicating the I/O subsystem is saturated.

Performance Optimizations

1. Set GODEBUG=madvdontneed=1 to force the runtime to release memory more aggressively.

2. Change InfluxDB configuration ( influxdb.conf ) to reduce I/O pressure and memory usage:

[data]
  # Reduce WAL sync frequency to 1s (asynchronous flush)
  wal-fsync-delay = "1s"

  # Switch index from in‑memory to TSI1 to avoid high memory during retention deletions
  index-version = "tsi1"

  # Increase compaction throughput to 64 MB
  compact-throughput = "64m"

Restart the service with the environment variable:

env GODEBUG=madvdontneed=1 /usr/bin/influxd -config /usr/bin/influxdb.conf

Online Verification

After a week of running the tuned instance, memory usage dropped to ~55 % of RAM and disk I/O fell to ~200 IOPS with only 6 % utilization, confirming the problem was mitigated.

References

Understanding virt/res/shr relationships – https://www.orchome.com/298

Server architectures (SMP, NUMA, MPP) – https://cloud.tencent.com/developer/article/1372348

Swap pitfalls – https://blog.huoding.com/2012/11/08/198

Go scheduler – https://draveness.me/golang/docs/part3-runtime/ch06-concurrency/golang-goroutine/

NUMA‑aware scheduler for Go – https://docs.google.com/document/u/0/d/1d3iI2QWURgDIsSR6G2275vMeQ_X7w-qxM2Vp7iGwwuM/pub

Performance Tuninglinuxmemory-leakInfluxDBDisk I/OGo Runtime
360 Tech Engineering
Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.