How JD Built a Scalable Seller Log Platform with Kafka, Storm, ES & HBase
This article details JD's end‑to‑end seller log system architecture, explaining why Kafka, Storm, Elasticsearch and HBase were chosen, the challenges faced during scaling, and the practical solutions implemented to achieve a unified, high‑throughput logging platform for merchants and operations.
Introduction
The author shares the design and implementation of a unified seller‑log platform at JD, describing the technologies used, the reasons behind each choice, and the problems encountered along with optimization methods.
Business Scenario
Multiple business systems (orders, products, etc.) previously generated logs in disparate formats, making it difficult for merchants and operations to query them. A single platform was needed to collect, store, and query all logs, allowing users to self‑service without repeatedly contacting development teams.
Overall Design
The data flow is: Log client → Kafka cluster → Storm consumer → Elasticsearch (hot data) → HBase (cold data) . Kafka provides high‑throughput messaging, Storm handles real‑time stream processing, Elasticsearch offers fast search for recent logs, and HBase stores large volumes of historical logs.
Key Technologies
Kafka : Distributed publish‑subscribe system with high throughput, used as the message queue for log ingestion.
Storm : Open‑source real‑time stream processing framework that consumes Kafka streams and performs validation, enrichment, and persistence.
Elasticsearch : Distributed search engine built on Lucene, used for indexing and querying hot log data.
HBase : Column‑oriented, scalable storage built on HDFS, used for long‑term storage of cold logs.
Log Client
The log client offers a unified API similar to Log4j, simplifying integration for various services. It writes logs locally first using NIO memory‑mapped files for speed, then asynchronously pushes them to Kafka, ensuring minimal impact on business latency and guaranteeing durability.
Why Kafka?
Kafka's high throughput, fault‑tolerant partitioning, multi‑language support, and real‑time delivery make it ideal for the bursty, unsteady nature of log data. It smooths spikes into a steady stream for Storm processing and supports multiple consumers without redesigning producer code.
Storm Application
Storm consumes the Kafka stream, validates each log entry, transforms it into a domain object, and forwards it to an InsertBolt that persists the data. This two‑stage processing (validation → persistence) provides clear separation of concerns and fault tolerance.
Data Storage Strategy
Hot logs (last two months) are indexed in Elasticsearch for rich, multi‑condition queries. Older logs are archived in HBase, which handles massive data volumes efficiently but offers only simple retrieval, suitable for occasional access.
Challenges Faced
As log volume grew to billions of entries per day, insertion latency increased and the shared Kafka cluster became a bottleneck, causing overall system slowdown despite hot‑cold data separation.
Solution: Business‑Level Separation
The team partitioned high‑traffic services (e.g., orders, products) into dedicated Kafka, Elasticsearch, and HBase clusters, while less intensive services continued using the original infrastructure. This isolation improved throughput, reduced contention, and simplified management.
Conclusion
The presented architecture demonstrates a practical, scalable approach to building a unified logging platform using open‑source big‑data components. While some details such as monitoring, authentication, and permission management are omitted, the core design illustrates how to balance real‑time processing, fast search, and long‑term storage for massive log data.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
