Databases 5 min read

InfluxDB Storage Engine Architecture and Hardware Recommendations

This article explains InfluxDB's storage engine workflow—including WAL, Cache, TSM files, compression components, and file management—then provides hardware sizing guidance based on write/query load, series cardinality, and recommends SSD storage with sample configuration settings.

System Architect Go

Oct 28, 2019

InfluxDB Storage Engine Architecture and Hardware Recommendations

The article first references three related InfluxDB series articles and then dives into the storage engine architecture of InfluxDB, a time‑series database.

Data ingestion starts with a Write‑Ahead Log (WAL) that is synchronized to a Cache; when the Cache reaches a size or time threshold, the data is flushed to immutable TSM files.

To store large volumes efficiently, the engine compresses data within TSM files. FileStore mediates access to all TSM files, while the Compaction Planner selects files ready for compression and the Compactor performs the actual compression work.

The storage engine consists of the following components: In‑Memory Index, WAL, Cache, TSM Files, FileStore, Compactor, Compaction Planner, Compression, and Writers/Readers for file I/O.

In the hardware guide section, the article defines load by three metrics—writes per second, queries per second, and series cardinality—and classifies query complexity into simple, medium, and complex categories based on functions, regex usage, GROUP BY clauses, time range, and execution time.

Recommended hardware focuses on CPU core count, RAM size, and IOPS performance. SSDs are strongly advised; a sample configuration shows how to place the data and wal directories on separate storage devices:

[data]
    dir = "/var/lib/influxdb/data"
    wal-dir = "/var/lib/influxdb/wal"

Metadata such as database names, measurements, tag keys/values, and field keys are stored once, while field values and timestamps are stored per point; non‑string values need about three bytes, and strings vary based on compression.

Large series cardinalities (tens of millions) can cause memory pressure with the default in‑memory index, so careful data‑structure design is required. Separating WAL and data directories onto different disks reduces contention and improves write throughput.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Storage Engine Time Series Database Hardware InfluxDB WAL TSM

Written by

System Architect Go

Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.