Big Data 12 min read

How to Build a Scalable Elasticsearch-Powered Data Reporting System

This article explains how to design and implement a flexible, accurate data reporting platform using Elasticsearch aggregation, covering architecture, data synchronization, metric definition, SDK integration, and future scaling considerations for enterprise analytics.

NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
How to Build a Scalable Elasticsearch-Powered Data Reporting System

Introduction

Data reporting is crucial for enterprise decision‑making; flexibility and accuracy are essential. This article introduces an Elasticsearch‑based data metric system.

Background

Data reports are indispensable in B2B applications. Enterprises need multi‑dimensional detailed reports to assess business status.

Initial Reporting Solution

Early projects used a simple architecture where metrics were defined in code, scheduled by custom or open‑source job schedulers, and results stored in intermediate tables.

Problems with the Initial Approach

Inconsistent metric definitions across business lines.

Low development efficiency; adding or modifying metrics requires code changes and database schema updates.

Separate logic for different dimensions leads to duplicated work.

Elasticsearch‑Based Reporting Solution

As metric count grows, the limitations of the scheduled‑task architecture become evident. A lightweight data‑metric platform, inspired by data‑warehouse concepts, is proposed.

Key goals:

Standardize metric development to ensure consistent definitions.

Provide a metric platform allowing most metrics to be defined via SQL, reducing development effort.

Define a clear data‑report production workflow.

Offer an SDK for easy business‑side consumption.

Support multiple data sources (MySQL, Hive, Elasticsearch, etc.).

Enable timely alerts for metric anomalies.

Terminology

Atomic metric: the smallest indivisible metric.

Composite metric: derived from one or more atomic metrics.

Dimension: the perspective of data query (e.g., date, team).

Time bucket: the time range for metric calculation.

Report: a collection of composite/atomic metrics.

Business domain: a logical grouping of related metrics.

Architecture

The platform consists of five layers:

Raw data layer – relational database tables.

Detail layer – data synchronized to Elasticsearch via binlog.

Business layer – metric calculation and configuration.

Gateway – unified SDK for business calls.

Business console – UI for configuring dimensions, metrics, and retrieving reports.

Data synchronization uses three modes: binlog, full‑load, and custom topics. Detail data is stored in Elasticsearch with monthly index sharding to balance distribution and support cold‑data handling.

Aggregation Layer

Metrics are computed using Elasticsearch aggregations. With Elasticsearch‑SQL, business users can define atomic metrics via SQL, e.g.:

SELECT COUNT(*) FROM table_1 WHERE condition_1 <> 2 AND condition_2 = 2 GROUP BY A, B

WHERE clause defines metric filters; GROUP BY defines dimensions.

To avoid high CPU load, batch processing groups multiple enterprises per task, reducing request frequency and stabilizing the cluster.

Environment and Versioning

Metrics can have multiple environments (e.g., gray, full) and versions. New definitions are released in a gray environment for selected customers; if unsatisfactory, rollback to previous version is possible.

Business‑Side Invocation

When a business adds a metric, it is stored in the atomic metric pool with an initial version V1. Historical data sync tasks may be created, after which the metric becomes available for modeling and retrieval via the SDK.

Conclusion and Outlook

The proposed architecture unifies raw and detail data, shortens metric development cycles, and prevents inconsistent definitions. As data volume and business lines grow, further upgrades toward an internal data warehouse may be required to support custom reporting needs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

architectureSQLElasticsearchMetricsData Reporting
NetEase Smart Enterprise Tech+
Written by

NetEase Smart Enterprise Tech+

Get cutting-edge insights from NetEase's CTO, access the most valuable tech knowledge, and learn NetEase's latest best practices. NetEase Smart Enterprise Tech+ helps you grow from a thinker into a tech expert.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.