Big Data 12 min read

How We Built a Scalable Seller Log System with Kafka, Storm, ES & HBase

This article explains the design and implementation of a unified seller‑operation logging platform that uses Kafka for ingestion, Storm for real‑time processing, Elasticsearch for hot‑data search, and HBase for cold‑data storage, detailing the challenges faced and the optimizations applied.

21CTO

Nov 11, 2017

How We Built a Scalable Seller Log System with Kafka, Storm, ES & HBase

Introduction

This article describes how we built a logging system, the technologies used, why we chose them, and the problems and optimizations encountered, aiming to provide practical reference for similar projects.

Business Scenario

We maintain a project that collects, processes, stores, and queries logs of JD sellers' actions, called the "Seller Log". Unlike traditional code logs, this system records merchants' operations (e.g., price changes) so that sellers and operations staff can view their activity and developers can troubleshoot issues.

Overall Log Design

The overall flow is: Log client → Kafka cluster → Storm consumer → Elasticsearch → HBase. The client provides a unified API for all services, Kafka+Storm handle streaming, Elasticsearch offers rich search for hot logs, and HBase stores large volumes of cold logs.

Technical Points

Kafka: high‑throughput distributed publish‑subscribe messaging system.

Storm: open‑source distributed real‑time stream processing framework.

Elasticsearch: Lucene‑based distributed search engine for efficient multi‑condition queries.

HBase: column‑oriented scalable storage built on Hadoop HDFS, suitable for massive structured data.

Log Client

The log client offers a unified API similar to Log4j, making integration simple. It writes logs to local files using NIO memory‑mapped files, then asynchronously pushes them to Kafka, ensuring minimal impact on business performance and providing durability.

Why Use Kafka

Kafka provides high throughput, persistent storage with O(1) disk structure, partitioned messaging, multi‑language support, and low latency, making it ideal for handling the bursty, unsteady log streams and decoupling producers from consumers.

Storm Application

Storm consumes the steady stream from Kafka, validates logs, packages them, and uses an insertBolt to persist data, providing a clear real‑time processing pipeline.

Data Storage Handling

Hot logs are stored in Elasticsearch for fast multi‑condition search, while cold logs are archived in HBase to handle the massive yearly volume (6‑7 hundred million records per day) without degrading query performance.

Problems Encountered

As data grew, query latency increased and insert performance degraded. The Kafka cluster also struggled with the high ingestion rate from many clients.

Solution

We partitioned high‑volume business logs (orders, products) into separate Kafka, Elasticsearch, and HBase clusters, isolating them from other services. This improved performance and simplified management; cold data will later be moved to a data mart for analytics.

Conclusion

The article outlines the end‑to‑end log processing architecture, acknowledging that while not optimal, it evolved through iterative problem solving and optimization to reach a robust state.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

system architecture Big Data Elasticsearch kafka Logging HBase Storm

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.