How to Centralize Logs from Dockerized Services Using Flume and Kafka
This article explains a practical architecture for aggregating logs from distributed Docker containers by employing Flume NG as a lightweight log collector, Kafka as a high‑throughput message bus, and custom sinks to store logs per service, module and day with low latency and minimal resource impact.
Design Constraints and Requirements
Application Scenario
In a distributed environment that may host hundreds of servers, each log entry is smaller than 1 KB, never exceeding 50 KB, and the total daily log volume is under 500 GB.
Functional Requirements
Collect all service logs centrally.
Distinguish source and split logs by service, module and day granularity.
Non‑functional Requirements
Non‑intrusive: the collector runs as an independent process and its resource consumption is controllable.
Real‑time: end‑to‑end latency from log generation to centralized storage must be less than 4 seconds.
Persistence: retain logs for the most recent N days.
Loss tolerance: occasional loss is acceptable as long as the loss ratio stays below a defined threshold (e.g., 0.01%).
Ordering: strict ordering is not required.
Availability: the collector is an offline function with a target of three‑nines (99.9%) yearly uptime.
Implementation Architecture
The solution consists of three layers—Producer, Broker, and Consumer—connected through Apache Flume NG and Kafka.
Producer Layer
Each Docker container runs an independent Flume NG agent that tails the application log file (using tail -F) and forwards the log events to a Kafka sink. This design satisfies the non‑intrusive requirement because no code changes are needed inside the application.
FROM ${BASE_IMAGE}
MAINTAINER ${MAINTAINER}
ENV REFRESH_AT ${REFRESH_AT}
RUN mkdir -p /opt/${MODULE_NAME}
ADD ${PACKAGE_NAME} /opt/${MODULE_NAME}/
COPY service.supervisord.conf /etc/supervisord.conf.d/service.supervisord.conf
COPY supervisor-msoa-wrapper.sh /opt/${MODULE_NAME}/supervisor-msoa-wrapper.sh
RUN chmod +x /opt/${MODULE_NAME}/supervisor-msoa-wrapper.sh
RUN chmod +x /opt/${MODULE_NAME}/*.sh
EXPOSE
ENTRYPOINT ["/usr/bin/supervisord", "-c", "/etc/supervisord.conf"] #!/bin/bash
function shutdown() {
date
echo "Shutting down Service"
unset SERVICE_PID
cd /opt/${MODULE_NAME}
source stop.sh
}
cd /opt/${MODULE_NAME}
source stop.sh
echo "Starting Service"
source start.sh
export SERVICE_PID=$!
sleep 4
nohup /opt/apache-flume-1.6.0-bin/bin/flume-ng agent \
--conf /opt/apache-flume-1.6.0-bin/conf \
--conf-file /opt/apache-flume-1.6.0-bin/conf/logback-to-kafka.conf \
--name a1 -Dflume.root.logger=INFO,console &
trap shutdown HUP INT QUIT ABRT KILL ALRM TERM TSTP
echo "Waiting for $SERVICE_PID"
wait $SERVICE_PIDBroker Layer
Multiple Flume NG clients publish logs to a Kafka cluster. Kafka provides high throughput, partitioned storage, and configurable replication (set to 2 in this design) to meet durability and performance goals.
# Create Kafka topic
bin/kafka-topics.sh --create \
--zookeeper localhost:2181 \
--replication-factor 2 \
--partitions 4 \
--topic keplerlogConsumer Layer
A second Flume NG agent consumes the Kafka topic, buffers events in an in‑memory channel, and writes them to local files using a custom RollingByTypeAndDayFileSink. The sink extracts a module name from the event header (or from a prefix added by the source) and creates files named module.YYYYMMDD.
# Flume agent configuration for the consumer
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.zookeeperConnect = localhost:2181
a1.sources.r1.topic = keplerlog
a1.sources.r1.batchSize = 5
a1.sources.r1.groupId = flume-collector
a1.sources.r1.kafka.consumer.timeout.ms = 800
# Custom sink that rolls files by type and day
a1.sinks.k1.type = com.baidu.unbiz.flume.sink.RollingByTypeAndDayFileSink
a1.sinks.k1.channel = c1
a1.sinks.k1.sink.directory = /home/work/data/kepler-log
# In‑memory channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 100
# Bindings
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1Practical Method
Container Configuration
The Docker image includes Flume binaries and a supervisord configuration to keep both the application and the Flume agent alive.
Flume Configuration Details
The producer uses a custom StaticLinePrefixExecSource that prefixes each log line with serviceName##$$##module so the consumer sink can split the line and route it to the appropriate file.
a1.sources.r1.type = com.baidu.unbiz.flume.sink.StaticLinePrefixExecSource
a1.sources.r1.command = tail -F /opt/MODULE_NAME/log/logback.log
a1.sources.r1.prefix = service1
a1.sources.r1.separator = ##$$##
a1.sources.r1.suffix = m1-ocean-1004.cpKafka sink configuration (producer side):
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = keplerlog
a1.sinks.k1.brokerList = gzns-cm-201508c02n01.gzns:9092,gzns-cm-201508c02n02.gzns:9092
a1.sinks.k1.requiredAcks = 0
a1.sinks.k1.batchSize = 5Key pitfalls discovered during implementation:
When the exec source wraps tail -F inside a script that adds extra processing (e.g., piping to awk), recent log lines may be dropped because the source discards output that arrives before the script finishes.
Flume’s default KafkaSource does not forward custom headers; the custom source must explicitly add them to the event.
Conclusion
By combining open‑source components such as Flume NG and Kafka, engineers can build a low‑latency, scalable log aggregation pipeline for containerized services. The same pipeline can be extended to downstream analytics (Spark Streaming, ELK stack, etc.) to enable real‑time monitoring, alerting, and data‑driven insights.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
