Big Data 13 min read

How JD Built a Scalable Seller Log Platform with Kafka, Storm, ES & HBase

This article details JD's end‑to‑end seller log system architecture, explaining why Kafka, Storm, Elasticsearch and HBase were chosen, the challenges faced during scaling, and the practical solutions implemented to achieve a unified, high‑throughput logging platform for merchants and operations.

dbaplus Community
dbaplus Community
dbaplus Community
How JD Built a Scalable Seller Log Platform with Kafka, Storm, ES & HBase

Introduction

The author shares the design and implementation of a unified seller‑log platform at JD, describing the technologies used, the reasons behind each choice, and the problems encountered along with optimization methods.

Business Scenario

Multiple business systems (orders, products, etc.) previously generated logs in disparate formats, making it difficult for merchants and operations to query them. A single platform was needed to collect, store, and query all logs, allowing users to self‑service without repeatedly contacting development teams.

Overall Design

The data flow is: Log client → Kafka cluster → Storm consumer → Elasticsearch (hot data) → HBase (cold data) . Kafka provides high‑throughput messaging, Storm handles real‑time stream processing, Elasticsearch offers fast search for recent logs, and HBase stores large volumes of historical logs.

Key Technologies

Kafka : Distributed publish‑subscribe system with high throughput, used as the message queue for log ingestion.

Storm : Open‑source real‑time stream processing framework that consumes Kafka streams and performs validation, enrichment, and persistence.

Elasticsearch : Distributed search engine built on Lucene, used for indexing and querying hot log data.

HBase : Column‑oriented, scalable storage built on HDFS, used for long‑term storage of cold logs.

Log Client

The log client offers a unified API similar to Log4j, simplifying integration for various services. It writes logs locally first using NIO memory‑mapped files for speed, then asynchronously pushes them to Kafka, ensuring minimal impact on business latency and guaranteeing durability.

Why Kafka?

Kafka's high throughput, fault‑tolerant partitioning, multi‑language support, and real‑time delivery make it ideal for the bursty, unsteady nature of log data. It smooths spikes into a steady stream for Storm processing and supports multiple consumers without redesigning producer code.

Storm Application

Storm consumes the Kafka stream, validates each log entry, transforms it into a domain object, and forwards it to an InsertBolt that persists the data. This two‑stage processing (validation → persistence) provides clear separation of concerns and fault tolerance.

Data Storage Strategy

Hot logs (last two months) are indexed in Elasticsearch for rich, multi‑condition queries. Older logs are archived in HBase, which handles massive data volumes efficiently but offers only simple retrieval, suitable for occasional access.

Challenges Faced

As log volume grew to billions of entries per day, insertion latency increased and the shared Kafka cluster became a bottleneck, causing overall system slowdown despite hot‑cold data separation.

Solution: Business‑Level Separation

The team partitioned high‑traffic services (e.g., orders, products) into dedicated Kafka, Elasticsearch, and HBase clusters, while less intensive services continued using the original infrastructure. This isolation improved throughput, reduced contention, and simplified management.

Conclusion

The presented architecture demonstrates a practical, scalable approach to building a unified logging platform using open‑source big‑data components. While some details such as monitoring, authentication, and permission management are omitted, the core design illustrates how to balance real‑time processing, fast search, and long‑term storage for massive log data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datastream processingElasticsearchKafkaHBaselog platformStorm
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.