Databases 18 min read

How JD’s HoraeDB Tackles Massive Time‑Series Data at Scale

This article introduces JD Cloud’s self‑built time‑series database HoraeDB, explaining its core concepts, typical use cases, architectural layers, high‑performance features, down‑sampling strategies, compression techniques, and stability measures for handling massive, 24‑hour monitoring data at scale.

JD Cloud Developers

Sep 15, 2020

How JD’s HoraeDB Tackles Massive Time‑Series Data at Scale

With the rapid growth of the Internet, many data in our lives are associated with time, such as weather changes, stock prices, and server metrics. JD Cloud (Jingdong Zhili Cloud) needs to store and analyze massive monitoring metrics that are time‑related. Existing open‑source solutions cannot fully meet its needs, so JD Cloud built a distributed time‑series database called HoraeDB.

Time‑series databases (TSDB) are high‑performance, low‑cost, reliable online services for time‑tagged data, offering efficient read/write, high compression, interpolation and aggregation. They are widely used in service monitoring, IoT device monitoring, production safety, power detection, and more.

A time‑series data point consists of a metric name, tags describing the series, and a sequence of data points (timestamp + value). Typical use cases include stock trading prices, temperature changes, server monitoring, IoT sensor data, and website/service monitoring.

Key challenges for TSDB are massive write throughput (millions of points per second, 24/7), low‑latency reads for large queries, and storage cost for long‑term retention.

HoraeDB is compatible with OpenTSDB write protocol and supports OpenTSDB RESTful API and PromQL for queries. Its main characteristics are:

High performance: batch asynchronous writes, high‑concurrency queries, powerful aggregation.

High availability: distributed storage with adjustable replica count and consistency, multi‑AZ writes and HA queries.

Low usage cost: rich data types, JSON REST interface, full compatibility with OpenTSDB and PromQL.

Open‑source ecosystem compatibility: works with OpenTSDB, PromQL, Kibana, etc.

The architecture is layered from top to bottom:

Http‑Server layer : listens to HTTP ports and handles requests.

Protocol processing layer : parses OpenTSDB, Prometheus adapters, Remote‑Read/Write.

HoraeDB protocol layer : defines internal write (HoraeDB‑PutReq) and query (HoraeDB‑QueryReq) structures.

Processing engine layer : write‑engine and query‑engine.

Data queue layer : buffers incoming data before persisting.

Storage client layer : adapters for Elasticsearch (meta), Cassandra (data points), Cache, Local storage.

Data storage layer : Cassandra stores data points, Elasticsearch stores meta, TS‑Cache stores recent hot data.

In Cassandra, rows are identified by uuid + partitionKey, and columns are timestamps holding compressed values and a createTime (timeUUID) for versioning.

To support various query spans, HoraeDB implements multi‑level down‑sampling, aggregating raw points into 10‑minute and 1‑hour buckets.

For large‑scale analysis, HoraeDB provides two aggregation modes:

Streaming aggregation : data flows through Kafka, Flink jobs aggregate according to user‑defined rules, results written back to storage.

Batch aggregation : scheduled tasks pull data from storage, compute aggregates, and write results back.

The new query engine adopts a pipeline execution model with operators forming a DAG, enabling top‑down demand‑driven execution, clear interfaces, and easy extensibility (e.g., adding a ValueFilter operator).

Hot recent data (last three hours) are cached in a custom distributed cache called TS‑Cache, which uses a shared map of series maps backed by skip‑list structures. Data are compressed using the Gorilla algorithm: timestamps stored as delta‑of‑delta, values stored as XOR of successive values.

To ensure stability under traffic spikes and heavy queries, HoraeDB employs service isolation, read/write separation, compute‑storage separation, multi‑AZ deployment, rate limiting at the HTTP layer, write‑queue back‑pressure, query‑load management (fast vs slow queues), and comprehensive monitoring.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring time_series_database high performance Distributed storage compression Downsampling

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.