Real-Time Order Analytics System Architecture Using Flume, Kafka, Storm, and Redis
This article introduces a beginner-friendly architecture for real-time order analytics in a big‑data environment, detailing how Flume collects logs, Kafka buffers them, Storm processes streams, and Redis stores results, while also covering configuration, code snippets, deployment steps, and troubleshooting tips.
The author presents a simple real‑time order analytics system to illustrate the basic architecture of a big‑data streaming solution. The pipeline consists of an e‑commerce order server generating logs, Flume ingesting those logs, Kafka acting as a durable buffer, Storm performing stream processing, and Redis storing the aggregated results for a web UI.
System Overview
An image (omitted) shows the flow: order logs → Flume → Kafka → Storm Topology → Redis → Web application.
Business Background
During high‑traffic promotional events, operators need instant visibility of order metrics to adjust strategies and stimulate purchases. Traditional batch processing cannot meet the low‑latency requirements, prompting the use of real‑time frameworks such as Storm, Heron, or Spark Streaming.
Order Log Format
Each log entry follows a structured pattern, e.g., orderNumber: XX | orderDate: XX | paymentNumber: XX | paymentDate: XX | merchantName: XX | sku: [ skuName: XX skuNum: XX skuCode: XX skuPrice: XX totalSkuPrice: XX; ... ] | price: [ totalPrice: XX discount: XX paymentPrice: XX ]
Log Generation
The author provides a Log4j2‑based simulator (code available on CSDN) that periodically writes order logs to rotating files.
Log Collection with Flume
Flume agents run on each log‑producing server. The Exec source tails the log file, and a Kafka sink forwards events to a Kafka cluster. An interceptor adds a topic header so that events are routed to the appropriate Kafka topic.
Example Flume configuration (excerpt): <source> type = exec command = tail -F /path/to/order.log interceptors = i1 i1.type = org.apache.flume.interceptor.HostInterceptor$Builder i1.hostHeader = topic </source> <sink> type = org.apache.flume.sink.kafka.KafkaSink topic = orders brokerList = kafka1:9092,kafka2:9092 batchSize = 1000 requireAcks = 1 </sink>
Kafka Message System
Kafka provides durable, high‑throughput buffering. Each message is written to a log file, partitioned, and replicated. Consumers track their position via an offset stored in Zookeeper. Configuration examples:
config/server-1.properties: broker.id=1 listeners=PLAINTEXT://:9093 log.dir=/export/data/kafka zookeeper.connect=localhost:2181
config/server-2.properties: broker.id=2 listeners=PLAINTEXT://:9093 log.dir=/export/data/kafka zookeeper.connect=localhost:2181
Start the brokers with: bin/kafka-server-start.sh config/server-1.properties & bin/kafka-server-start.sh config/server-2.properties &
Storm Real‑Time Computation
Storm processes streams of tuples using a topology composed of Spouts (data sources) and Bolts (processing units). The author explains concepts such as streams, tuples, grouping, tasks, and workers, and notes that Nimbus and Supervisors store their state in Zookeeper for fault tolerance.
Key Storm configuration (storm.yaml) is placed under conf/ and includes Zookeeper servers, local directories, and Nimbus host settings.
Redis as Result Store
Redis is used to keep aggregated sales figures. The topology writes to a Sorted Set where the score represents total sales and the member is the merchant name, using the atomic command ZINCRBY via Jedis.
Integration Flow
Data flow: Flume → Kafka → Storm KafkaSpout → Bolt (parses order log, updates Redis) → Redis. Sample code snippets for the KafkaSpout and Bolt are provided (omitted for brevity).
Deployment Steps
Start Zookeeper.
Start Kafka brokers.
Deploy Flume agents to push logs into Kafka.
Start the Storm cluster (Nimbus and Supervisors).
Launch Redis server.
Package the topology with Maven ( mvn assembly:assembly ) and submit it via storm jar or the Storm UI.
Verify results using redis-cli ZRANGE key 0 -1 WITHSCORES .
Troubleshooting
Resolve Maven dependency conflicts by excluding slf4j‑log4j12 from Kafka dependencies.
Ensure Kafka offsets are created in Zookeeper before the first topology run; otherwise Storm will start reading from the end of the log.
Initialize non‑serializable objects (e.g., JedisPool ) inside Bolt prepare() method.
Disable Redis protected‑mode or set a password when Storm connects remotely.
Use Storm UI to inspect topology errors such as “Kill … No Such process”.
Avoid port conflicts when running Kafka producer and consumer on the same host.
The article concludes with a thank‑you note and an invitation for readers to share feedback.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.