Big Data 26 min read

Real-Time Order Analytics System Architecture Using Flume, Kafka, Storm, and Redis

This article introduces a beginner-friendly architecture for real-time order analytics in a big‑data environment, detailing how Flume collects logs, Kafka buffers them, Storm processes streams, and Redis stores results, while also covering configuration, code snippets, deployment steps, and troubleshooting tips.

Architecture Digest

Jul 26, 2016

Real-Time Order Analytics System Architecture Using Flume, Kafka, Storm, and Redis

The author presents a simple real‑time order analytics system to illustrate the basic architecture of a big‑data streaming solution. The pipeline consists of an e‑commerce order server generating logs, Flume ingesting those logs, Kafka acting as a durable buffer, Storm performing stream processing, and Redis storing the aggregated results for a web UI.

System Overview

An image (omitted) shows the flow: order logs → Flume → Kafka → Storm Topology → Redis → Web application.

Business Background

During high‑traffic promotional events, operators need instant visibility of order metrics to adjust strategies and stimulate purchases. Traditional batch processing cannot meet the low‑latency requirements, prompting the use of real‑time frameworks such as Storm, Heron, or Spark Streaming.

Order Log Format

Each log entry follows a structured pattern, e.g.,

orderNumber: XX | orderDate: XX | paymentNumber: XX | paymentDate: XX | merchantName: XX | sku: [ skuName: XX skuNum: XX skuCode: XX skuPrice: XX totalSkuPrice: XX; ... ] | price: [ totalPrice: XX discount: XX paymentPrice: XX ]

Log Generation

The author provides a Log4j2‑based simulator (code available on CSDN) that periodically writes order logs to rotating files.

Log Collection with Flume

Flume agents run on each log‑producing server. The Exec source tails the log file, and a Kafka sink forwards events to a Kafka cluster. An interceptor adds a topic header so that events are routed to the appropriate Kafka topic.

Example Flume configuration (excerpt):

<source>
  type = exec
  command = tail -F /path/to/order.log
  interceptors = i1
  i1.type = org.apache.flume.interceptor.HostInterceptor$Builder
  i1.hostHeader = topic
</source>
<sink>
  type = org.apache.flume.sink.kafka.KafkaSink
  topic = orders
  brokerList = kafka1:9092,kafka2:9092
  batchSize = 1000
  requireAcks = 1
</sink>

Kafka Message System

Kafka provides durable, high‑throughput buffering. Each message is written to a log file, partitioned, and replicated. Consumers track their position via an offset stored in Zookeeper. Configuration examples:

config/server-1.properties:
broker.id=1
listeners=PLAINTEXT://:9093
log.dir=/export/data/kafka
zookeeper.connect=localhost:2181

config/server-2.properties:
broker.id=2
listeners=PLAINTEXT://:9093
log.dir=/export/data/kafka
zookeeper.connect=localhost:2181

Start the brokers with:

bin/kafka-server-start.sh config/server-1.properties &
bin/kafka-server-start.sh config/server-2.properties &

Storm Real‑Time Computation

Storm processes streams of tuples using a topology composed of Spouts (data sources) and Bolts (processing units). The author explains concepts such as streams, tuples, grouping, tasks, and workers, and notes that Nimbus and Supervisors store their state in Zookeeper for fault tolerance.

Key Storm configuration (storm.yaml) is placed under conf/ and includes Zookeeper servers, local directories, and Nimbus host settings.

Redis as Result Store

Redis is used to keep aggregated sales figures. The topology writes to a Sorted Set where the score represents total sales and the member is the merchant name, using the atomic command ZINCRBY via Jedis.

Integration Flow

Data flow: Flume → Kafka → Storm KafkaSpout → Bolt (parses order log, updates Redis) → Redis. Sample code snippets for the KafkaSpout and Bolt are provided (omitted for brevity).

Deployment Steps

Start Zookeeper.

Start Kafka brokers.

Deploy Flume agents to push logs into Kafka.

Start the Storm cluster (Nimbus and Supervisors).

Launch Redis server.

Package the topology with Maven ( mvn assembly:assembly) and submit it via storm jar or the Storm UI.

Verify results using redis-cli ZRANGE key 0 -1 WITHSCORES.

Troubleshooting

Resolve Maven dependency conflicts by excluding slf4j‑log4j12 from Kafka dependencies.

Ensure Kafka offsets are created in Zookeeper before the first topology run; otherwise Storm will start reading from the end of the log.

Initialize non‑serializable objects (e.g., JedisPool) inside Bolt prepare() method.

Disable Redis protected‑mode or set a password when Storm connects remotely.

Use Storm UI to inspect topology errors such as “Kill … No Such process”.

Avoid port conflicts when running Kafka producer and consumer on the same host.

The article concludes with a thank‑you note and an invitation for readers to share feedback.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kafka Flume Storm

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.