Tagged articles

Flume

26 articles · Page 1 of 1

Sep 8, 2025 · Operations

Mastering Distributed Log Architecture: From Flume to ELK and Beyond

This comprehensive guide walks you through the challenges of large‑scale log collection, real‑time processing, storage optimization, and visualization, detailing practical configurations for Flume, Logstash, Elasticsearch, Kibana, Filebeat, Kafka, Kubernetes, and future AIOps integrations to build a reliable, cost‑effective distributed logging system.

ELKFlumeMonitoring

0 likes · 24 min read

Mastering Distributed Log Architecture: From Flume to ELK and Beyond

Architecture Digest

Oct 11, 2021 · Big Data

Core Technologies and Architecture of a Big Data Platform

This article explains the typical architecture of a big‑data platform, detailing its four core layers—data collection, storage & analysis, data sharing, and application—and describing the key technologies such as Flume, DataX, HDFS, Hive, Spark, Spark Streaming, and task scheduling components.

Big DataData ArchitectureDataX

0 likes · 8 min read

Core Technologies and Architecture of a Big Data Platform

Programmer DD

Mar 28, 2021 · Big Data

Mastering Apache Flume: Architecture, Components, and Key Features

This article provides a comprehensive overview of Apache Flume, detailing its purpose as a distributed log aggregation system, explaining its core components such as sources, channels, and sinks, and illustrating its architecture, multi‑agent setups, and key features like reliability, scalability, compression, and monitoring.

Flumedata ingestionlog-aggregation

0 likes · 9 min read

Architect

Dec 23, 2020 · Operations

Design and Evaluation of Log Collection Agents: Flume vs Filebeat

This article analyses the shortcomings of traditional log‑collection agents, compares Flume and Filebeat based on low‑cost, stability, efficiency and lightweight criteria, and presents practical solutions for file discovery, offset tracking, multi‑line handling and performance tuning in modern logging pipelines.

Agent DesignFlumeObservability

0 likes · 13 min read

Design and Evaluation of Log Collection Agents: Flume vs Filebeat

Big Data Technology & Architecture

Nov 8, 2020 · Big Data

Flume Tuning Guide for High‑Throughput Data Ingestion

This article explains how to identify and resolve performance bottlenecks in Apache Flume by configuring Taildir sources, optimizing channel capacities, tuning Kafka sinks, adjusting JVM options, and using simple monitoring scripts, enabling a single Flume‑NG agent to sustain over 50,000 RPS in production.

Big DataConfigurationFlume

0 likes · 10 min read

Flume Tuning Guide for High‑Throughput Data Ingestion

Big Data Technology & Architecture

Nov 2, 2020 · Big Data

Log Collection and Processing Architecture with Flume and Kafka for Big Data Platforms

This article explains how to design a scalable log collection system for big‑data platforms by combining Flume for data ingestion, Kafka for buffering and high‑throughput transport, and downstream processing components, providing configuration examples and best‑practice recommendations.

Big DataFlumeReal-time Processing

0 likes · 9 min read

Log Collection and Processing Architecture with Flume and Kafka for Big Data Platforms

21CTO

Oct 30, 2020 · Big Data

Which Log Collection System Wins? Scribe, Chukwa, Kafka, Flume & ELK Compared

This article reviews the background, requirements, and architectural designs of major open‑source log collection systems—including Facebook’s Scribe, Apache’s Chukwa, LinkedIn’s Kafka, Cloudera’s Flume—and evaluates mature monitoring tools such as ELK, highlighting their features, use cases, advantages, and drawbacks for large‑scale log processing.

Big DataELKFlume

0 likes · 18 min read

Which Log Collection System Wins? Scribe, Chukwa, Kafka, Flume & ELK Compared

Java Architect Essentials

Aug 21, 2020 · Big Data

Design and Integration of Flume, Kafka, Storm, Drools, and Redis for Real‑Time ETL Log Analysis

This article presents a modular architecture for real‑time ETL log analysis that combines Flume for log collection, Kafka as a buffering layer, Storm for stream processing, Drools for rule‑based data transformation, and Redis for fast storage, detailing installation, configuration, and code integration steps.

Big DataDroolsFlume

0 likes · 23 min read

Design and Integration of Flume, Kafka, Storm, Drools, and Redis for Real‑Time ETL Log Analysis

Big Data Technology & Architecture

Aug 18, 2020 · Big Data

End-to-End Real-Time Web Log Processing with Flume, Kafka, Spark Streaming, HBase, and Spring Boot

This tutorial demonstrates how to generate simulated web access logs in Python, schedule them with Crontab, collect them in real time using Flume, forward them to Kafka, process the streams with Spark Streaming, store results in HBase, and visualize the data via a Spring Boot application with ECharts.

Big DataEChartsFlume

0 likes · 36 min read

End-to-End Real-Time Web Log Processing with Flume, Kafka, Spark Streaming, HBase, and Spring Boot

Big Data Technology & Architecture

Aug 12, 2020 · Big Data

Real‑time User Behavior Collection Using Flume, Kafka, and Spark Streaming on Hadoop

This guide explains how to continuously collect web‑service user behavior logs, route them through Flume agents to Kafka, and finally ingest them with Spark Streaming into HDFS, covering environment preparation, configuration files, deployment steps, and verification procedures.

Big DataFlumeHadoop

0 likes · 9 min read

Real‑time User Behavior Collection Using Flume, Kafka, and Spark Streaming on Hadoop

Top Architect

Mar 6, 2020 · Big Data

Design and Integration of a Real-Time Log Analysis System Using Flume, Kafka, Storm, Drools, and Redis

This article details the design, installation, and modular integration of Flume, Kafka, Storm, Drools, and Redis to build a real‑time log analysis pipeline for ETL systems, discussing architecture, configuration, code examples, and practical considerations for scalability and fault tolerance.

Big DataDroolsFlume

0 likes · 24 min read

Design and Integration of a Real-Time Log Analysis System Using Flume, Kafka, Storm, Drools, and Redis

Architecture Digest

Mar 5, 2020 · Big Data

Design and Integration of Flume, Kafka, Storm, Drools, and Redis for Real-Time ETL Log Analysis

This article presents a modular architecture for real‑time ETL log collection and analysis, detailing the installation and configuration of Flume, Kafka, Storm, Drools, and Redis, and explains how their integration improves fault tolerance, scalability, and processing speed.

DroolsFlumeReal-time Processing

0 likes · 22 min read

MaGe Linux Operations

Mar 30, 2019 · Information Security

How to Build a Real-Time Security Log Collection and Alert System with ELK, Kafka, and Flume

This guide walks through setting up a comprehensive security log collection pipeline—covering WAF, firewall, and Nginx logs—using ELK, Logstash, Kafka, and Flume, and then configuring real‑time alerts with Sentinl or ElastAlert integrated with DingTalk and email notifications.

AlertingELKFlume

0 likes · 16 min read

How to Build a Real-Time Security Log Collection and Alert System with ELK, Kafka, and Flume

Youzan Coder

Mar 1, 2019 · Big Data

Flume Practice at YouZan: Data Collection and Pipeline Construction in Big Data Scenarios

YouZan’s experience with Flume shows how the at‑least‑once delivery model, combined with FileChannel storage and custom extensions such as an NsqSource, hourly‑based HdfsEventSink, metric reporting server, and timestamp interceptor, can reliably move MySQL binlog data to HDFS, while tuning transaction batch size and channel capacity boosts throughput and stability, paving the way for a unified management platform.

At-Least-OnceFlumeHDFS

0 likes · 11 min read

Flume Practice at YouZan: Data Collection and Pipeline Construction in Big Data Scenarios

Zhuanzhuan Tech

Feb 26, 2019 · Cloud Native

Automated Business Log Collection in Zhaozhuan Container Cloud Platform Using Log‑Pilot

This article describes how Zhaozhuan built an automated, business‑transparent log‑collection solution for its container cloud platform by evaluating several approaches, adopting Alibaba Cloud's open‑source log‑pilot, customizing its deployment, and addressing practical issues such as time‑zone bugs, latency, and duplicate collection.

Cloud NativeFluentdFlume

0 likes · 13 min read

Automated Business Log Collection in Zhaozhuan Container Cloud Platform Using Log‑Pilot

Architecture Digest

May 28, 2018 · Big Data

Building a Real-Time Stream Processing Platform with Hadoop Ecosystem (Kafka, Spark Streaming, HBase)

This guide details how to construct a real-time data processing platform on CentOS 7 using the Hadoop ecosystem—installing and configuring Zookeeper, Maven, Hadoop, Kafka, HBase, Spark, and Flume—followed by a Spark Streaming job that consumes Kafka messages and writes them into HBase.

Big DataFlumeHBase

0 likes · 14 min read

Building a Real-Time Stream Processing Platform with Hadoop Ecosystem (Kafka, Spark Streaming, HBase)

iQIYI Technical Product Team

Jan 31, 2018 · Big Data

Evolution of iQIYI Real-Time Big Data Collection System

iQIYI’s big‑data collection system has progressed from simple HTTP log uploads to a Flume‑Kafka pipeline and finally to a custom Venus‑Agent architecture with centralized configuration, persistent offsets, dual‑Kafka streams and Flink processing, now handling tens of millions of queries per second and over three hundred billion records daily to power its AI‑driven services.

Big DataFlinkFlume

0 likes · 15 min read

Evolution of iQIYI Real-Time Big Data Collection System

Architecture Digest

Sep 7, 2017 · Big Data

Design and Implementation of Bilibili's Lancer Log Collection System

The article presents the architecture, component design, optimizations, and reliability guarantees of Bilibili's Lancer log collection system, a Flume‑based distributed pipeline that handles both real‑time and offline data streams for billions of events daily.

Big DataFlumedata pipeline

0 likes · 13 min read

Design and Implementation of Bilibili's Lancer Log Collection System

Tongcheng Travel Technology Center

Mar 24, 2017 · Operations

Evolution of Tongcheng Log System Architecture

The article chronicles the development of Tongcheng's centralized log system from early file‑based logging through a MongoDB‑based solution to the current multi‑layer architecture using Flume, Elasticsearch, and Hadoop, highlighting design decisions, challenges, and future improvement plans.

Big DataFlumelog system

0 likes · 7 min read

Evolution of Tongcheng Log System Architecture

dbaplus Community

Aug 18, 2016 · Big Data

How Zhejiang Mobile Scaled Billion‑Level Real‑Time Stream Processing with Storm

This article details Zhejiang Mobile's architecture and practical experience in building a billion‑scale real‑time stream computing platform using Storm, Kafka, Flume, and Redis, covering use cases, system design, performance bottlenecks, optimization techniques, and monitoring strategies.

Apache StormBig Data ArchitectureFlume

0 likes · 20 min read

How Zhejiang Mobile Scaled Billion‑Level Real‑Time Stream Processing with Storm

Architecture Digest

Jul 26, 2016 · Big Data

Real-Time Order Analytics System Architecture Using Flume, Kafka, Storm, and Redis

This article introduces a beginner-friendly architecture for real-time order analytics in a big‑data environment, detailing how Flume collects logs, Kafka buffers them, Storm processes streams, and Redis stores results, while also covering configuration, code snippets, deployment steps, and troubleshooting tips.

FlumeStormkafka

0 likes · 26 min read

Real-Time Order Analytics System Architecture Using Flume, Kafka, Storm, and Redis

Architecture Digest

May 22, 2016 · Big Data

Design and Architecture of Youzan Unified Log Platform

The article details the design, components, and operational challenges of Youzan's unified log platform, describing its multi‑layer architecture, ingestion methods using rsyslog/logstash and Flume‑NG, Kafka‑based log center, processing pipelines with Storm/Spark, and storage in HDFS and Elasticsearch.

FlumeMonitoringdistributed systems

0 likes · 10 min read

Design and Architecture of Youzan Unified Log Platform

21CTO

May 16, 2016 · Operations

How to Centralize Logs from Dockerized Services Using Flume and Kafka

This article explains a practical architecture for aggregating logs from distributed Docker containers by employing Flume NG as a lightweight log collector, Kafka as a high‑throughput message bus, and custom sinks to store logs per service, module and day with low latency and minimal resource impact.

DockerFlumeOperations

0 likes · 17 min read

How to Centralize Logs from Dockerized Services Using Flume and Kafka

Architect

Feb 18, 2016 · Cloud Native

Collecting Docker Container Logs with Flume: Strategies and Implementation

This article explains how to capture Docker container logs, discusses the challenges of multi‑line log correlation, and presents two approaches—client‑side parsing and server‑side parsing—along with a concrete Flume customization using a DockerLog Java bean.

DockerFlumeJava

0 likes · 7 min read

Collecting Docker Container Logs with Flume: Strategies and Implementation

21CTO

Sep 27, 2015 · Big Data

How Weidian Built a Scalable Big Data Platform for Mobile Commerce

This article outlines the design and implementation of Weidian’s end‑to‑end big data processing platform, covering dataset definition, data collection via Flume‑based DataAgent, transmission through Databus, storage options such as HDFS, Kafka and Elasticsearch, and the monitoring and resource‑integration strategies that support massive mobile commerce logs.

ElasticsearchFlumeHadoop

0 likes · 18 min read

How Weidian Built a Scalable Big Data Platform for Mobile Commerce

Nightwalker Tech

Mar 14, 2015 · Big Data

Log Collection and Analysis: Architectures Using Flume, Kafka, Storm, Elasticsearch, and MongoDB

This article discusses various log collection and analysis architectures, comparing solutions such as Flume‑Kafka‑Storm pipelines, Sentry, MongoDB, ELK stack, and Hadoop, and shares practical experiences, advantages, drawbacks, and deployment tips from multiple engineers.

Big DataFlumeStorm

0 likes · 7 min read

Log Collection and Analysis: Architectures Using Flume, Kafka, Storm, Elasticsearch, and MongoDB