InfluxDB Storage Engine Architecture and Hardware Recommendations
This article explains InfluxDB's storage engine workflow—including WAL, Cache, TSM files, compression components, and file management—then provides hardware sizing guidance based on write/query load, series cardinality, and recommends SSD storage with sample configuration settings.
The article first references three related InfluxDB series articles and then dives into the storage engine architecture of InfluxDB, a time‑series database.
Data ingestion starts with a Write‑Ahead Log (WAL) that is synchronized to a Cache; when the Cache reaches a size or time threshold, the data is flushed to immutable TSM files.
To store large volumes efficiently, the engine compresses data within TSM files. FileStore mediates access to all TSM files, while the Compaction Planner selects files ready for compression and the Compactor performs the actual compression work.
The storage engine consists of the following components: In‑Memory Index, WAL, Cache, TSM Files, FileStore, Compactor, Compaction Planner, Compression, and Writers/Readers for file I/O.
In the hardware guide section, the article defines load by three metrics—writes per second, queries per second, and series cardinality—and classifies query complexity into simple, medium, and complex categories based on functions, regex usage, GROUP BY clauses, time range, and execution time.
Recommended hardware focuses on CPU core count, RAM size, and IOPS performance. SSDs are strongly advised; a sample configuration shows how to place the data and wal directories on separate storage devices:
[data]
dir = "/var/lib/influxdb/data"
wal-dir = "/var/lib/influxdb/wal"Metadata such as database names, measurements, tag keys/values, and field keys are stored once, while field values and timestamps are stored per point; non‑string values need about three bytes, and strings vary based on compression.
Large series cardinalities (tens of millions) can cause memory pressure with the default in‑memory index, so careful data‑structure design is required. Separating WAL and data directories onto different disks reduces contention and improves write throughput.
System Architect Go
Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.