Databases 5 min read

Apache HoraeDB (CeresDB): An Open‑Source Distributed Time‑Series Database

Apache HoraeDB (CeresDB) is an open‑source, distributed, high‑availability time‑series database developed by Ant Group, supporting multi‑dimensional queries, compatible with Prometheus and OpenTSDB, and offering SQL and OLAP capabilities for use cases such as APM, IoT monitoring, financial analytics, and AI‑infra observability.

AntData
AntData
AntData
Apache HoraeDB (CeresDB): An Open‑Source Distributed Time‑Series Database

Apache HoraeDB (CeresDB) is a distributed, high‑availability, high‑reliability time‑series database open‑sourced by Ant Group. After years of rigorous testing during Double‑11, it handles tens of trillions of data points per day and provides multi‑dimensional query capabilities.

Project URL: https://github.com/apache/horaedb . The project was officially announced in June 2022, and on December 11, 2023 the core source code was donated to the Apache Software Foundation under the HoraeDB brand.

Business value: Apache HoraeDB not only processes conventional time‑series data but also tackles complex analytical scenarios. It is compatible with Prometheus, OpenTSDB and other traditional time‑series protocols and ecosystems, and offers SQL queries as well as analytical capabilities similar to Kdb+ and InfluxDB IOx (OLAP).

Application scenarios:

Application Performance Monitoring (APM): Stores and analyzes real‑time performance metrics such as response times and system resource utilization, helping teams monitor system health and quickly identify bottlenecks.

Monitoring and Internet of Things (IoT): Persists data from sensors and devices, providing real‑time insights into device status and environmental metrics, supporting decision‑making and predictive maintenance.

Financial Market Analysis: Handles financial time‑series data (stock prices, trading volume, etc.), offering accurate historical records and real‑time analysis to aid risk management and quantitative trading strategies.

AI Infra Cloud‑Native Observability: Compatible with Prometheus and other cloud‑native monitoring standards, it can process metric data generated during large‑scale machine‑learning model training and inference, enabling rapid fault detection and improving the robustness of distributed training systems.

Technical details:

Storage Engine Exploration: CeresDB uses columnar storage combined with hybrid storage, partition scanning with pruning, and efficient filtering to organize time‑series data, solving the inverted‑index bloat caused by high‑cardinality timestamps.

Distributed Solution: CeresDB adopts a storage‑compute separation architecture, achieving elastic scaling of compute and storage, high availability, and load balancing across the cluster.

Main components include:

CeresMeta Cluster – the metadata center responsible for overall cluster scheduling.

CeresDB – the instance that organizes and stores time‑series data.

WAL Service – a write‑ahead log service for persisting real‑time writes.

Object Storage – stores SST files generated from the memtable.

distributed systemsSQLobservabilityopen-sourcetime-series database
AntData
Written by

AntData

Ant Data leverages Ant Group's leading technological innovation in big data, databases, and multimedia, with years of industry practice. Through long-term technology planning and continuous innovation, we strive to build world-class data technology and products.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.