Big Data 12 min read

Design and Implementation of a Seller Log System Using Kafka, Storm, Elasticsearch, and HBase

This article describes the design and implementation of a seller log system, detailing the use of Kafka for high‑throughput messaging, Storm for real‑time stream processing, Elasticsearch for hot‑data search, and HBase for cold‑data storage, along with challenges faced and optimization solutions.

Architecture Digest
Architecture Digest
Architecture Digest
Design and Implementation of a Seller Log System Using Kafka, Storm, Elasticsearch, and HBase

Introduction: The article explains how a comprehensive seller log system was built, covering the technologies chosen, reasons for those choices, encountered problems, and optimization steps to provide practical reference for similar implementations.

Business Scenario: Multiple business systems (orders, products, etc.) previously logged data independently with varied formats, making it difficult for merchants and operations to query logs. A unified log platform was created to allow all teams to ingest logs, grant permission‑based access, and enable merchants and operators to troubleshoot independently.

Overall Design: The flow consists of a log client, Kafka cluster, Storm consumer, Elasticsearch for hot data, and HBase for cold data.

Technical Points:

Kafka – a high‑throughput distributed publish‑subscribe messaging system.

Storm – an open‑source distributed real‑time stream processing framework.

Elasticsearch – a Lucene‑based distributed search server providing fast, multi‑condition queries.

HBase – a column‑oriented, highly reliable, scalable storage system built on Hadoop HDFS, suitable for massive structured data.

Log Client: Provides a unified API similar to Log4j, simplifying integration. It writes logs locally using NIO memory‑mapped files, then asynchronously pushes them to Kafka, ensuring minimal impact on business latency and guaranteeing durability.

Why Kafka: Kafka offers high throughput, fault‑tolerant distributed messaging, supports multiple languages, provides real‑time delivery, and enables seamless producer‑consumer decoupling, making it ideal for the bursty, non‑steady nature of log data.

Kafka Application Scenarios and Advantages:

Continuous messaging with O(1) disk structure for massive data.

Million‑level messages per second on commodity hardware.

Partitioned, ordered consumption across clusters.

Multi‑language client support.

Immediate consumption after production, enabling event‑driven architectures.

Storm Application: Logs are streamed from Kafka to Storm, where they are validated, transformed, and finally persisted. The topology includes validation bolts and an insert bolt that writes to storage.

Data Storage Handling: Hot (recent) logs are stored in Elasticsearch for rich query capabilities, while older (cold) logs are archived in HBase to reduce ES load and maintain query performance over billions of daily records.

Problems Encountered: Growing data volume strained query performance and insertion speed; the shared Kafka cluster could not sustain the high write rate from many clients.

Solution: Business‑level separation – high‑traffic services like orders and products received dedicated Kafka, Elasticsearch, and HBase clusters, isolating their load and improving overall performance. Future plans include feeding HBase data into a data‑mart for broader analytics.

Conclusion: The article outlines the end‑to‑end architecture of the seller log system, acknowledges omitted details (monitoring, authentication, permission management), and emphasizes that system design evolves through continuous iteration, problem discovery, and optimization.

Big DataElasticsearchStreamingKafkaHBasestormlog system
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.