Databases 9 min read

How to Build a High‑Performance InfluxDB Cluster for Massive Time‑Series Data

This article explores InfluxDB’s time‑series strengths, compares TSDB with traditional databases, explains its TSM storage engine and shard concepts, and details the design, architecture, performance benchmarks, integration steps, and future enhancements of a high‑availability InfluxDB‑HA solution used at 360.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
How to Build a High‑Performance InfluxDB Cluster for Massive Time‑Series Data

Basic Concepts

TSDB vs Traditional DB

Traditional databases record current values.

Time‑series databases record a series of data over time.

TSDB Application Scenarios

Time‑series data that requires historical trends, periodic patterns, anomaly detection, and future prediction, such as device monitoring, medical vitals, and financial transaction logs.

Why Choose InfluxDB

Active community and proven performance.

SQL‑like query language reduces learning cost.

Native HTTP API supports multiple languages.

Pluggable storage solution.

InfluxDB TSM Storage Engine Overview

Components: cache (in‑memory map, default 1 GB), wal (write‑ahead log for persistence), tsm file (data storage), compactor (handles cache→snapshot→tsm and merges small tsm files).

Shard – Concept Above TSM Engine

Shards are created for different timestamp ranges, enabling fast time‑based queries and efficient batch deletions.

Project Origin

InfluxDB community edition lacks clustering.

Official influxdb‑relay only supports dual‑write, no load balancing.

Eleme’s influx‑proxy solution is complex to deploy and maintain.

360 needed real‑time monitoring for 100 k hosts and 200 metrics.

Thus the InfluxDB‑HA project was created.

Architecture

Official InfluxDB‑Relay Solution

Unresolved issues:

Dual‑write only backs up data, does not improve read/write performance.

Queries still go to InfluxDB, increasing configuration complexity.

No retry mechanism for failed writes.

Eleme InfluxDB High‑Availability Solution

Advantages:

Influx‑proxy rebuilt to meet performance and maintenance needs.

Dynamic scaling of InfluxDB nodes.

Robust retry mechanism for failed requests.

Disadvantages:

Many components increase learning and maintenance cost.

Retry can add load when machines are at capacity.

Not aligned with simple monitoring storage needs.

360 Internal InfluxDB‑HA Solution

Advantages:

Uses measurement as the smallest split unit, ensuring efficient time‑series queries.

Supports dynamic sharding and table splitting at the business layer.

Performance Comparison

Disk I/O comparison with a single‑node InfluxDB.

CPU usage comparison with a single‑node InfluxDB.

Business Integration Guide

InfluxDB‑HA manages InfluxDB instance configurations.

Grafana integration instructions.

Third‑party programs write data via the standard /write API and support any language SDK.

Future Iteration Plan

Integrate Kafka or RabbitMQ as a buffer before writes to reduce data loss.

Hot‑load configuration files (currently using Go’s fsnotify; future use etcd for centralized config).

Support business‑side partitioning to handle larger data scales while keeping measurement as the minimal split unit.

InfluxDB Usage Tips

Continuous Queries/Select: For queries over 100 k samples, using tags and proper indexing greatly reduces memory consumption and avoids OOM.

Prefer tags in queries.

Test continuous queries in a simulated production environment before deployment.

Retention Policy: Setting retention policies preserves data but can increase CPU usage on large volumes; apply during low read/write periods and test on a slave instance first.

Operate RP during low concurrency.

Iterate on RP settings on a slave before production rollout.

performanceclusteringInfluxDBTimeSeriesDatabaseHighAvailability
360 Zhihui Cloud Developer
Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.