Databases 9 min read

Choosing the Right Time‑Series Database: Types, Queries, and Performance Trade‑offs

Time‑series data, defined by a timestamp field, appears everywhere, and the article explains how to choose an appropriate time‑series database by comparing two schema models, their query patterns, performance trade‑offs, and why modern solutions like Elasticsearch, columnar stores, and Druid excel at real‑time massive aggregation.

ITPUB

Dec 3, 2015

Choosing the Right Time‑Series Database: Types, Queries, and Performance Trade‑offs

What Is Time‑Series Data?

Time‑series data is any dataset that includes a timestamp field, such as stock prices, environmental temperatures, or server CPU usage. Queries always filter by a time range and return the timestamp together with the requested values.

Choosing a Time‑Series Database

Although almost any database can store time‑series data, the supported query capabilities differ. Based on these capabilities, time‑series databases fall into two broad categories.

Schema Type 1: Metric‑Name‑Centric

[metric_name] [timestamp] [value]

Typical query pattern:

SELECT value FROM metric WHERE metric_name="A" AND timestamp >= B AND timestamp < C

This model stores data exactly as it will be retrieved, making it fast and easy to optimize. However, it has two major drawbacks:

Slow to adapt to changes: Any change in the required chart forces a full re‑ingestion of data from the source.

Storage bloat: Supporting many query dimensions (e.g., by region, carrier, user preference) requires pre‑creating a combinatorial explosion of tables, wasting space.

Common implementations include:

File‑based simple storage: RRDTool, Graphite Whisper.

K/V‑based stores: OpenTSDB (on HBase), Blueflood, KairosDB (on Cassandra), InfluxDB, Prometheus (on LevelDB).

Relational databases: MySQL, PostgreSQL.

Schema Type 2: Multi‑Dimension Table

[timestamp] [d1] [d2] ... [dn] [v1] [v2] ... [vn]

This layout enables richer queries, including aggregations:

SELECT d2, sum(v1) / sum(v2) FROM metric WHERE d1 = "A" AND timestamp >= B AND timestamp < C GROUP BY d2

To support fast real‑time aggregation, a database must provide three capabilities:

Indexed row retrieval: Quickly filter billions of rows down to a few million.

Efficient loading: Load the filtered rows from storage into memory rapidly.

Distributed computation: Perform GROUP BY and SELECT calculations across a cluster.

Technical Building Blocks

Retrieval: Search‑engine technology (e.g., Lucene) using inverted indexes for fast filtering.

Loading: Column‑oriented analytical databases (e.g., C‑store, MonetDB) that store data by column on disk for rapid in‑memory loading.

Distributed Computation: Big‑data engines (e.g., Hadoop, Spark) that shard data and apply map/reduce style processing.

Real‑World Implementations for Massive Real‑Time Aggregation

When the data volume reaches billions of rows, many traditional TSDBs (e.g., OpenTSDB) struggle. The following systems are known to handle such workloads:

Search‑engine databases: Elasticsearch, Crate.io (Elasticsearch‑based), Solr.

Columnar stores: Vertica (C‑store descendant), Actian (MonetDB descendant).

Specialized TSDB: Druid.io.

Why Elasticsearch Excels

Elasticsearch combines the three technical domains:

Lucene’s inverted index is faster than MySQL’s B‑tree for range and full‑text searches.

It supports compound AND/OR queries without needing custom composite indexes.

Nested documents let many data points be stored in a single document block, reducing index size.

Since Lucene 4.0, DocValues lower memory usage and speed up loading for aggregations.

Lucene’s segment architecture and Elasticsearch’s index sharding enable parallel query execution.

Aggregations have been supported since Elasticsearch 1.0, with pipeline aggregations (nested sub‑queries) added in 2.0, offering SQL‑like capabilities far beyond Crate.io or Solr.

Additional observations:

OpenTSDB lacks secondary indexes, relying on HBase row‑key scans, which slows tag‑based queries.

MySQL’s best practice for time‑series is to use a clustered primary key and avoid secondary indexes, similar to OpenTSDB’s approach.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data SQL database Elasticsearch time series Aggregation

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.