How to Build a Scalable Elasticsearch-Powered Data Reporting System
This article explains how to design and implement a flexible, accurate data reporting platform using Elasticsearch aggregation, covering architecture, data synchronization, metric definition, SDK integration, and future scaling considerations for enterprise analytics.
Introduction
Data reporting is crucial for enterprise decision‑making; flexibility and accuracy are essential. This article introduces an Elasticsearch‑based data metric system.
Background
Data reports are indispensable in B2B applications. Enterprises need multi‑dimensional detailed reports to assess business status.
Initial Reporting Solution
Early projects used a simple architecture where metrics were defined in code, scheduled by custom or open‑source job schedulers, and results stored in intermediate tables.
Problems with the Initial Approach
Inconsistent metric definitions across business lines.
Low development efficiency; adding or modifying metrics requires code changes and database schema updates.
Separate logic for different dimensions leads to duplicated work.
Elasticsearch‑Based Reporting Solution
As metric count grows, the limitations of the scheduled‑task architecture become evident. A lightweight data‑metric platform, inspired by data‑warehouse concepts, is proposed.
Key goals:
Standardize metric development to ensure consistent definitions.
Provide a metric platform allowing most metrics to be defined via SQL, reducing development effort.
Define a clear data‑report production workflow.
Offer an SDK for easy business‑side consumption.
Support multiple data sources (MySQL, Hive, Elasticsearch, etc.).
Enable timely alerts for metric anomalies.
Terminology
Atomic metric: the smallest indivisible metric.
Composite metric: derived from one or more atomic metrics.
Dimension: the perspective of data query (e.g., date, team).
Time bucket: the time range for metric calculation.
Report: a collection of composite/atomic metrics.
Business domain: a logical grouping of related metrics.
Architecture
The platform consists of five layers:
Raw data layer – relational database tables.
Detail layer – data synchronized to Elasticsearch via binlog.
Business layer – metric calculation and configuration.
Gateway – unified SDK for business calls.
Business console – UI for configuring dimensions, metrics, and retrieving reports.
Data synchronization uses three modes: binlog, full‑load, and custom topics. Detail data is stored in Elasticsearch with monthly index sharding to balance distribution and support cold‑data handling.
Aggregation Layer
Metrics are computed using Elasticsearch aggregations. With Elasticsearch‑SQL, business users can define atomic metrics via SQL, e.g.:
SELECT COUNT(*) FROM table_1 WHERE condition_1 <> 2 AND condition_2 = 2 GROUP BY A, BWHERE clause defines metric filters; GROUP BY defines dimensions.
To avoid high CPU load, batch processing groups multiple enterprises per task, reducing request frequency and stabilizing the cluster.
Environment and Versioning
Metrics can have multiple environments (e.g., gray, full) and versions. New definitions are released in a gray environment for selected customers; if unsatisfactory, rollback to previous version is possible.
Business‑Side Invocation
When a business adds a metric, it is stored in the atomic metric pool with an initial version V1. Historical data sync tasks may be created, after which the metric becomes available for modeling and retrieval via the SDK.
Conclusion and Outlook
The proposed architecture unifies raw and detail data, shortens metric development cycles, and prevents inconsistent definitions. As data volume and business lines grow, further upgrades toward an internal data warehouse may be required to support custom reporting needs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
NetEase Smart Enterprise Tech+
Get cutting-edge insights from NetEase's CTO, access the most valuable tech knowledge, and learn NetEase's latest best practices. NetEase Smart Enterprise Tech+ helps you grow from a thinker into a tech expert.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
