Big Data 5 min read

Step-by-Step Guide to Installing and Configuring Apache Flume on a Cluster

This guide walks through downloading Apache Flume, setting up a master‑slave cluster, and configuring NetCat, Exec, and Avro sources with corresponding sinks and memory channels, including verification commands to ensure the agents run correctly.

Practical DevOps Architecture
Practical DevOps Architecture
Practical DevOps Architecture
Step-by-Step Guide to Installing and Configuring Apache Flume on a Cluster

1. Software download

wget http://mirror.bit.edu.cn/apache/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz
tar zxvf apache-flume-1.6.0-bin.tar.gz

2. Cluster environment

Master: 172.16.11.97 Slave1: 172.16.11.98 Slave2: 172.16.11.99

3. NetCat source configuration (conf/flume-netcat.conf)

vim conf/flume-netcat.conf
# Name the components on this agent

agent.sources = r1 agent.sinks = k1 agent.channels = c1 # Source configuration agent.sources.r1.type = netcat agent.sources.r1.bind = 127.0.0.1 agent.sources.r1.port = 44444 # Sink configuration agent.sinks.k1.type = logger # Channel configuration agent.channels.c1.type = memory agent.channels.c1.capacity = 1000 agent.channels.c1.transactionCapacity = 100 # Bind source and sink to the channel agent.sources.r1.channels = c1 agent.sinks.k1.channel = c1

Verification:

bin/flume-ng agent --conf conf --conf-file conf/flume-netcat.conf --name=agent -Dflume.root.logger=INFO,console
telnet master 44444

4. Exec source configuration (conf/flume-exec.conf)

vim conf/flume-exec.conf
# Name the components on this agent

agent.sources = r1 agent.sinks = k1 agent.channels = c1 # Source configuration agent.sources.r1.type = exec agent.sources.r1.command = tail -f /data/hadoop/flume/test.txt # Sink configuration agent.sinks.k1.type = logger # Channel configuration agent.channels.c1.type = memory agent.channels.c1.capacity = 1000 agent.channels.c1.transactionCapacity = 100 # Bind source and sink to the channel agent.sources.r1.channels = c1 agent.sinks.k1.channel = c1

Verification:

bin/flume-ng agent --conf conf --conf-file conf/flume-exec.conf --name=agent -Dflume.root.logger=INFO,console
while true; do echo `date` >> /data/hadoop/flume/test.txt ; sleep 1; done

5. Avro source configuration (conf/flume-avro.conf)

vim conf/flume-avro.conf
# Define a memory channel

agent.channels.c1.type = memory # Define Avro source agent.sources.r1.type = avro agent.sources.r1.bind = 127.0.0.1 agent.sources.r1.port = 44444 agent.sources.r1.channels = c1 # Define HDFS sink agent.sinks.k1.type = hdfs agent.sinks.k1.channel = c1 agent.sinks.k1.hdfs.path = hdfs://master:9000/flume_data_pool agent.sinks.k1.hdfs.filePrefix = events- agent.sinks.k1.hdfs.fileType = DataStream agent.sinks.k1.hdfs.writeFormat = Text agent.sinks.k1.hdfs.rollSize = 0 agent.sinks.k1.hdfs.rollCount = 600000 agent.sinks.k1.hdfs.rollInterval = 600 # Bind components agent.sources = r1 agent.sinks = k1 agent.channels = c1

Verification:

bin/flume-ng agent --conf conf --conf-file conf/flume-avro.conf --name=agent -Dflume.root.logger=DEBUG,console
telnet master 44444
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataCluster Setupdata ingestionApache Flume
Practical DevOps Architecture
Written by

Practical DevOps Architecture

Hands‑on DevOps operations using Docker, K8s, Jenkins, and Ansible—empowering ops professionals to grow together through sharing, discussion, knowledge consolidation, and continuous improvement.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.