Tagged articles

1273 articles

Page 9 of 13

Feb 5, 2021 · Big Data

Design and Implementation of a Real‑Time OLAP Engine Using ClickHouse in JD Energy Management Platform

This article describes how JD's Energy Management Platform leverages ClickHouse as a high‑performance, MPP‑based OLAP engine to provide real‑time, multi‑dimensional analytics on IoT energy data, covering business background, technology selection, system architecture, data ingestion, storage, replication, and a generic query interface with code examples.

KafkaOLAPclickhouse

0 likes · 11 min read

Design and Implementation of a Real‑Time OLAP Engine Using ClickHouse in JD Energy Management Platform

dbaplus Community

Feb 3, 2021 · Operations

Why Does Kafka Partition Lose Its Leader? A Deep Dive into Index Corruption and Recovery

This article examines a Kafka cluster failure where partition 34 could not elect a leader due to index file corruption, explains the underlying sanity‑check logic, reproduces the fault, and offers practical recovery steps and configuration recommendations to prevent data loss.

Cluster TroubleshootingKafkaLog Index

0 likes · 16 min read

Why Does Kafka Partition Lose Its Leader? A Deep Dive into Index Corruption and Recovery

Zhongtong Tech

Feb 3, 2021 · Operations

How to Resolve Kafka ISR Fluctuations and High RT by Tuning Network Threads

This article walks through a real‑world Kafka incident where a sudden surge in client connections caused ISR churn, frequent connection drops, and high response time, and explains how monitoring thread idle rates and increasing network and I/O thread counts restored stability.

ISRKafkaOperations

0 likes · 7 min read

How to Resolve Kafka ISR Fluctuations and High RT by Tuning Network Threads

Aikesheng Open Source Community

Jan 28, 2021 · Databases

DTLE 3.21.01.0 Release Notes – New Features and Fixes for MySQL Data Transfer Component

The DTLE 3.21.01.0 release introduces Kafka batch‑send parameters, support for rename‑table DDL, empty transactions for skipped DDL/DCL, and several bug fixes, while providing repository links, documentation, and installation tips for this MySQL‑focused data‑transfer component.

DTLEData TransferKafka

0 likes · 4 min read

DTLE 3.21.01.0 Release Notes – New Features and Fixes for MySQL Data Transfer Component

Practical DevOps Architecture

Jan 28, 2021 · Operations

Step-by-Step Guide to Installing Zookeeper and Kafka on a Kubernetes Cluster

This tutorial walks through preparing three Kubernetes nodes, extracting and distributing Zookeeper, configuring its zoo.cfg and myid files, starting and verifying the Zookeeper ensemble, then installing Kafka, adjusting its server.properties, and finally launching Kafka across the cluster.

Big DataInstallationKafka

0 likes · 6 min read

Step-by-Step Guide to Installing Zookeeper and Kafka on a Kubernetes Cluster

Architect

Jan 22, 2021 · Big Data

Understanding Kafka Topic Partitions, Producer Partitioning Strategies, and Consumer Assignment

This article explains how Kafka producers decide which partition to send messages to, how topic partition counts are configured, and how consumer groups assign partitions to instances using default range and round‑robin strategies, with code examples for illustration.

Big DataConsumerKafka

0 likes · 17 min read

Understanding Kafka Topic Partitions, Producer Partitioning Strategies, and Consumer Assignment

Code Ape Tech Column

Jan 21, 2021 · Interview Experience

Master Distributed System Interview Questions: CAP, Redis, Zookeeper, Kafka and More

This article compiles essential interview‑style questions and detailed answers on distributed system fundamentals—including CAP and BASE theories, consistency models, distributed transactions, Redis features and persistence, Zookeeper coordination, Kafka architecture, and common design patterns for high‑concurrency scenarios.

Distributed SystemsKafkaMessage Queue

0 likes · 38 min read

Master Distributed System Interview Questions: CAP, Redis, Zookeeper, Kafka and More

Code Ape Tech Column

Jan 19, 2021 · Operations

Scaling Kafka Clusters to Support Millions of Partitions: Challenges and Solutions

This article examines the technical challenges of scaling Kafka clusters to handle millions of partitions—including Zookeeper node explosion, replication overhead, controller recovery latency, and broker restart delays—and proposes solutions such as parallel ZK fetching, metadata synchronization via internal topics, logical cluster composition, and physical cluster splitting.

Distributed SystemsKafkacluster operations

0 likes · 13 min read

Scaling Kafka Clusters to Support Millions of Partitions: Challenges and Solutions

New Oriental Technology

Jan 18, 2021 · Information Security

Kafka Security Authentication and Authorization Configuration Guide (SASL/PLAIN and SASL/SCRAM)

This guide explains Kafka's authentication and authorization mechanisms, covering SASL/PLAIN and SASL/SCRAM setups, JAAS file creation, server property configuration, ACL management, and provides complete Java producer and consumer examples for secure communication.

ACLAuthenticationAuthorization

0 likes · 19 min read

Kafka Security Authentication and Authorization Configuration Guide (SASL/PLAIN and SASL/SCRAM)

21CTO

Jan 16, 2021 · Backend Development

How to Build a Go‑Based Log Collection System with etcd, Context, and Kafka

This article walks through designing and implementing a Go log‑collection agent that uses etcd for configuration storage, context for timeout and metadata handling, and Kafka for message consumption, complete with code examples, setup instructions, and a rate‑limiting utility.

GoKafkacontext

0 likes · 16 min read

How to Build a Go‑Based Log Collection System with etcd, Context, and Kafka

DataFunTalk

Jan 16, 2021 · Big Data

Practical Application of Flink + Kafka at NetEase Cloud Music: Architecture, Platform Design, and Lessons Learned

This article presents a detailed case study of NetEase Cloud Music’s real‑time analytics platform built on Kafka and Flink, covering background, architectural choices, platform‑level design, operational challenges, solutions such as the Magina framework, and a Q&A on reliability and monitoring.

Big DataFlinkKafka

0 likes · 11 min read

Practical Application of Flink + Kafka at NetEase Cloud Music: Architecture, Platform Design, and Lessons Learned

Didi Tech

Jan 14, 2021 · Cloud Computing

Design and Implementation of Didi's Logi‑KafkaManager Multi‑tenant Kafka Cloud Platform

Didi’s Logi‑KafkaManager is a multi‑tenant Kafka cloud platform that consolidates dozens of clusters into a secure, isolated gateway‑driven service offering intuitive web‑based topic management, real‑time metrics visualization, automated diagnostics, quota governance and safe scaling, delivering high internal satisfaction and enterprise commercialization.

Big DataKafkacloud platform

0 likes · 17 min read

Design and Implementation of Didi's Logi‑KafkaManager Multi‑tenant Kafka Cloud Platform

Meituan Technology Team

Jan 14, 2021 · Big Data

Design and Implementation of an SSD‑Based Application‑Layer Cache Architecture for Kafka in Meituan Data Platform

Meituan built an SSD‑based application‑layer cache for Kafka that bypasses PageCache contention between real‑time and delayed jobs, classifies log segments across SSD and HDD, limits flush rates, and achieves up to 80% latency reduction while guaranteeing stable real‑time consumption.

Big DataKafkaLogSegment

0 likes · 19 min read

Design and Implementation of an SSD‑Based Application‑Layer Cache Architecture for Kafka in Meituan Data Platform

NetEase Smart Enterprise Tech+

Jan 14, 2021 · Big Data

How Yidun Achieves Real-Time, High-Performance Public-Opinion Data Cleaning with Groovy and JVM

Yidun’s public-opinion monitoring platform transforms massive raw web data into a unified format by separating dynamic Groovy-script-driven cleaning from static processing, achieving real-time source integration, high throughput, scalability, and high availability while addressing format diversity, team coordination, and performance-flexibility trade-offs.

Big DataETLGroovy

0 likes · 5 min read

How Yidun Achieves Real-Time, High-Performance Public-Opinion Data Cleaning with Groovy and JVM

Architect's Tech Stack

Jan 8, 2021 · Backend Development

Comprehensive Guide to Spring Kafka: Integration, Advanced Features, and Usage

This article provides a detailed tutorial on integrating Kafka with Spring using Spring‑Kafka, covering simple setup, embedded Kafka testing, topic creation, message sending and receiving, transaction support, listener configurations, manual acknowledgment, error handling, retry and dead‑letter queues, and related code examples.

KafkaMessagingMicroservices

0 likes · 21 min read

Comprehensive Guide to Spring Kafka: Integration, Advanced Features, and Usage

dbaplus Community

Jan 5, 2021 · Big Data

How Ctrip Built a Scalable Unified Log Framework for Payment Data

Facing massive, heterogeneous logs from numerous payment services, Ctrip’s data team designed a unified logging framework that extends log4j2, streams logs via Kafka to HDFS using a customized Camus pipeline, partitions and stores data in ORC for efficient Hive analysis, while addressing format, storage, and performance challenges.

Big DataCamusHadoop

0 likes · 16 min read

How Ctrip Built a Scalable Unified Log Framework for Payment Data

Programmer DD

Jan 5, 2021 · Backend Development

Understanding Kafka Partition Assignment: Strategies and Code Walkthrough

This article explains how Kafka determines which partition a producer sends a record to, how partition counts are configured, and how consumer groups assign partitions using the default, range, and round‑robin strategies, complemented by detailed Java code examples.

KafkaPartition AssignmentProducer

0 likes · 21 min read

Understanding Kafka Partition Assignment: Strategies and Code Walkthrough

JavaEdge

Jan 1, 2021 · Backend Development

Inside Kafka’s Network Stack: How SocketServer, Acceptor, and Processor Work

This article breaks down Kafka’s network communication layer, detailing the roles of SocketServer, the Acceptor thread, Processor threads, and related classes such as RequestChannel, KafkaRequestHandlerPool, and key configuration parameters, while illustrating their interactions with diagrams.

BackendKafkaReactor

0 likes · 7 min read

Inside Kafka’s Network Stack: How SocketServer, Acceptor, and Processor Work

Top Architect

Dec 30, 2020 · Backend Development

Using Kafka as a Storage System for Twitter’s Account Activity Replay API

The article explains how Twitter built the Account Activity Replay API by repurposing Kafka as a storage layer, detailing the system’s architecture, partitioning strategy, request handling, deduplication, and performance optimizations to provide reliable event recovery for developers.

InfrastructureKafkaTwitter

0 likes · 8 min read

Using Kafka as a Storage System for Twitter’s Account Activity Replay API

Code Ape Tech Column

Dec 30, 2020 · Industry Insights

Why Does a Single Kafka Broker Failure Bring Down Your Consumers?

The article explains Kafka's high‑availability architecture, covering multi‑replica redundancy, ISR mechanisms, producer acknowledgment settings, and a real‑world case where a broker crash halted consumption due to the __consumer_offsets topic's replication factor, then offers concrete remediation steps.

Consumer OffsetsISRKafka

0 likes · 10 min read

Why Does a Single Kafka Broker Failure Bring Down Your Consumers?

Selected Java Interview Questions

Dec 27, 2020 · Operations

Kafka Outage and High Availability Mechanisms

This article examines a Kafka outage scenario in a fintech company, explains Kafka’s multi-replica redundancy design, leader‑follower architecture, ISR mechanism, and how misconfiguration of the __consumer_offset topic can cause cluster-wide consumer failures, and provides solutions to ensure true high availability.

ACKConsumer OffsetISR

0 likes · 10 min read

Kafka Outage and High Availability Mechanisms

Code Ape Tech Column

Dec 25, 2020 · Backend Development

RabbitMQ vs Kafka: Which Messaging System Wins for Your Architecture?

This article compares RabbitMQ and Apache Kafka by examining their internal designs, messaging models, ordering guarantees, routing, timing, retention, fault‑tolerance, scalability, and consumer complexity, then provides concrete guidance on when to choose each technology for real‑world systems.

ComparisonKafkaMessage Queue

0 likes · 24 min read

RabbitMQ vs Kafka: Which Messaging System Wins for Your Architecture?

Code Ape Tech Column

Dec 18, 2020 · Operations

Why Message Queues Power Scalable Systems: Use Cases, Architecture & Top Solutions

This article explains the fundamentals of message‑queue middleware, outlines key scenarios such as asynchronous processing, application decoupling, traffic shaping, log handling and messaging, compares popular products like ActiveMQ, RabbitMQ, ZeroMQ and Kafka, and details JMS concepts and programming models.

JMSKafkaRabbitMQ

0 likes · 18 min read

Why Message Queues Power Scalable Systems: Use Cases, Architecture & Top Solutions

High Availability Architecture

Dec 17, 2020 · Big Data

How Kafka Achieves Million‑Level TPS: Sequential Disk I/O, MMAP, Zero‑Copy, and Batch Processing

This article explains how Kafka attains million‑level transactions per second by using sequential disk reads/writes, memory‑mapped files, DMA‑based zero‑copy transfers, and batch data transmission, detailing each technique and its impact on throughput and latency.

KafkaSequential I/OTPS

0 likes · 11 min read

How Kafka Achieves Million‑Level TPS: Sequential Disk I/O, MMAP, Zero‑Copy, and Batch Processing

macrozheng

Dec 15, 2020 · Big Data

How Kafka Achieves Million‑TPS Through Sequential I/O, MMAP, and Zero‑Copy

Kafka can sustain millions of transactions per second by writing data sequentially to disk, leveraging memory‑mapped files, employing zero‑copy DMA transfers, and batching messages, each technique reducing I/O overhead and CPU involvement, which together enable its high‑throughput performance in big‑data pipelines.

Big DataHigh ThroughputKafka

0 likes · 11 min read

How Kafka Achieves Million‑TPS Through Sequential I/O, MMAP, and Zero‑Copy

JavaEdge

Dec 5, 2020 · Big Data

How Kafka Chooses Its Partition Leaders: ZAB, Raft, and Controller Election Explained

This article explains the leader election mechanisms used in big‑data systems—ZAB in Zookeeper, Raft’s role‑based election, their drawbacks such as split‑brain and ZooKeeper overload, and how Kafka’s controller‑based design solves these issues with efficient partition leader selection.

Big DataKafkaRaft

0 likes · 7 min read

How Kafka Chooses Its Partition Leaders: ZAB, Raft, and Controller Election Explained

Programmer DD

Dec 2, 2020 · Backend Development

How Kafka Uses a Timing Wheel for Efficient Timeout Handling

Kafka handles many requests that require asynchronous processing or waiting for conditions by attaching a timeout parameter; if the condition isn’t met within the timeout, Kafka returns a timeout response, and it implements this efficiently using a hierarchical Timing Wheel data structure that offers O(1) insertion and fast expiration checks.

BackendKafkaScala

0 likes · 12 min read

How Kafka Uses a Timing Wheel for Efficient Timeout Handling

21CTO

Dec 1, 2020 · Big Data

How Kafka Implements Transactions: Inside the TC Service and Producer Workflow

This article provides a comprehensive walkthrough of Kafka's transaction mechanism, covering the transaction coordinator, producer initialization, partition handling, commit and abort processes, state management, high‑availability design, timeout handling, and relevant source code snippets.

Distributed SystemsKafkaProducer

0 likes · 22 min read

How Kafka Implements Transactions: Inside the TC Service and Producer Workflow

JavaEdge

Dec 1, 2020 · Backend Development

How Kafka’s OffsetIndex and TimeIndex Optimize Message Retrieval

This article explains Kafka’s internal index files—OffsetIndex and TimeIndex—including their file formats, how they store relative offsets and timestamps, the space‑saving optimizations, the processes for appending, truncating, and looking up entries, and best‑practice cautions for handling these indexes.

KafkaOffsetIndexTimeIndex

0 likes · 8 min read

How Kafka’s OffsetIndex and TimeIndex Optimize Message Retrieval

JavaEdge

Nov 30, 2020 · Backend Development

How Kafka’s Index Uses Binary Search and Cache‑Friendly Optimizations

This article explains Kafka's index architecture, the AbstractIndex class hierarchy, how entry sizes are chosen, the use of memory‑mapped files, the binary‑search algorithm for locating index entries, and a cache‑friendly improvement that reduces page faults and I/O latency.

Binary SearchCache OptimizationKafka

0 likes · 13 min read

How Kafka’s Index Uses Binary Search and Cache‑Friendly Optimizations

System Architect Go

Nov 30, 2020 · Databases

Five Ways to Sync MySQL Data to Elasticsearch, Redis, MQ, etc.

This article outlines five practical methods for synchronizing MySQL data to external systems such as Elasticsearch, Redis, and message queues, covering business‑layer hooks, middleware integration, scheduled tasks using updated_at, binlog parsing with ROW format, and handling mixed or statement binlog formats, plus open‑source tools.

BinlogElasticsearchKafka

0 likes · 5 min read

Five Ways to Sync MySQL Data to Elasticsearch, Redis, MQ, etc.

DataFunTalk

Nov 27, 2020 · Big Data

Evolution of Kafka‑Based Data Pipeline at Chehaoduo Group: Architecture, Scaling, and Best Practices

This article chronicles the four‑year evolution of Chehaoduo Group’s Kafka ecosystem—from its initial role as a simple data‑ingestion layer to becoming the core of the company’s large‑scale data pipeline—detailing cluster management, upgrade strategies, multi‑cluster deployment, AVRO schema handling, SDK development, and operational lessons learned.

AvroCluster ManagementKafka

0 likes · 21 min read

Evolution of Kafka‑Based Data Pipeline at Chehaoduo Group: Architecture, Scaling, and Best Practices

Full-Stack Internet Architecture

Nov 23, 2020 · Backend Development

Message Middleware: Benefits, Drawbacks, and Design Patterns for Concurrency, Ordering, Duplicate, and Transactional Messaging

This article explains the advantages and disadvantages of using message middleware in microservice architectures and details practical solutions for handling concurrency, ordered processing, duplicate messages, and transactional messaging using patterns like partitioning, outbox tables, CDC, and RocketMQ's two‑phase commit.

KafkaMicroservicesRocketMQ

0 likes · 12 min read

Message Middleware: Benefits, Drawbacks, and Design Patterns for Concurrency, Ordering, Duplicate, and Transactional Messaging

Tencent Cloud Developer

Nov 19, 2020 · Backend Development

Kafka Message Queue Reliability Design and Implementation

The article thoroughly explains Kafka’s message‑queue reliability design and implementation, covering use‑case scenarios, core concepts, storage format, producer acknowledgment settings, broker replication mechanisms (ISR, HW, LEO), consumer delivery semantics, the epoch solution for synchronization, and practical configuration guidelines for various consistency and availability requirements.

BrokerConsistencyConsumer

0 likes · 15 min read

Kafka Message Queue Reliability Design and Implementation

Java High-Performance Architecture

Nov 18, 2020 · Big Data

Why Pulsar Might Outperform Kafka: Key Advantages and Drawbacks

This article examines Apache Pulsar, an open‑source messaging platform created by Yahoo, compares it with Kafka by outlining Kafka’s common pain points, highlights Pulsar’s multi‑tenant architecture, layered storage, built‑in functions, and security features, and discusses the trade‑offs of each solution.

Apache PulsarBig DataDistributed Systems

0 likes · 6 min read

Why Pulsar Might Outperform Kafka: Key Advantages and Drawbacks

DataFunSummit

Nov 15, 2020 · Big Data

Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink

This article details the three‑stage evolution of 58.com’s commercial data warehouse, describing its massive scale, four‑layer architecture, technical challenges, migrations from MapReduce to Hive and Flink, real‑time streaming upgrades, and the resulting improvements in stability, accuracy, and timeliness.

Big DataData ArchitectureFlink

0 likes · 10 min read

Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink

dbaplus Community

Nov 15, 2020 · Big Data

Mastering Real‑Time Stream Processing with Flink: From Fundamentals to Kuaishou Production

This article walks through the evolution of big‑data systems to modern stream processing, explains core Flink concepts such as state, checkpoints, event‑time and windowing, and details Kuaishou’s real‑time UV calculation and fast‑failover techniques for high‑availability streaming jobs.

Big DataFlinkKafka

0 likes · 21 min read

Mastering Real‑Time Stream Processing with Flink: From Fundamentals to Kuaishou Production

Architecture Digest

Nov 14, 2020 · Big Data

Kafka Crash and High‑Availability Issues: Replica Design, ISR, and Consumer Offset Problems

The article explains why a single Kafka broker failure can render the whole cluster unavailable, detailing Kafka's multi‑replica architecture, ISR mechanism, leader election, producer acknowledgment settings, and the special handling required for the __consumer_offset topic.

Consumer OffsetsISRKafka

0 likes · 10 min read

Kafka Crash and High‑Availability Issues: Replica Design, ISR, and Consumer Offset Problems

Laravel Tech Community

Nov 12, 2020 · Backend Development

PHP Kafka Client Library (longlang/phpkafka) Overview

The PHP Kafka client library supports PHP‑FPM and Swoole environments, implements all 50 Kafka APIs with compression, SSL, and SASL features, requires PHP ≥ 7.1 and Kafka ≥ 1.0.0, and can be installed via Composer.

KafkaMessage Queueclient

0 likes · 2 min read

PHP Kafka Client Library (longlang/phpkafka) Overview

Tencent Cloud Middleware

Nov 12, 2020 · Backend Development

How We Migrated Our Self‑Built Message Queue to Tencent Cloud CKafka

This article details why the e‑commerce platform built its own Corgi message queue, the operational and cost drawbacks that prompted a move to Tencent Cloud CKafka, and the three‑phase migration strategy—including dual‑write, cut‑read, and cut‑write—while preserving message safety and low latency.

CKafkaKafka

0 likes · 11 min read

How We Migrated Our Self‑Built Message Queue to Tencent Cloud CKafka

Big Data Technology & Architecture

Nov 8, 2020 · Big Data

Flume Tuning Guide for High‑Throughput Data Ingestion

This article explains how to identify and resolve performance bottlenecks in Apache Flume by configuring Taildir sources, optimizing channel capacities, tuning Kafka sinks, adjusting JVM options, and using simple monitoring scripts, enabling a single Flume‑NG agent to sustain over 50,000 RPS in production.

Big DataConfigurationFlume

0 likes · 10 min read

Flume Tuning Guide for High‑Throughput Data Ingestion

System Architect Go

Nov 7, 2020 · Operations

Request Log Analysis System: Collected Fields, Derived Data, and Metrics

This article outlines a request log analysis system that records core request fields, adds proxy‑related data, derives IP‑based ASN and geographic information, parses user‑agent details, and provides comprehensive metrics such as PV/QPS, UV, traffic, latency, status monitoring, and business‑specific insights, all visualized via an ELK‑Kafka architecture.

BackendELKKafka

0 likes · 5 min read

Request Log Analysis System: Collected Fields, Derived Data, and Metrics

Big Data Technology & Architecture

Nov 2, 2020 · Big Data

Log Collection and Processing Architecture with Flume and Kafka for Big Data Platforms

This article explains how to design a scalable log collection system for big‑data platforms by combining Flume for data ingestion, Kafka for buffering and high‑throughput transport, and downstream processing components, providing configuration examples and best‑practice recommendations.

Big DataFlumeKafka

0 likes · 9 min read

Log Collection and Processing Architecture with Flume and Kafka for Big Data Platforms

Big Data Technology Architecture

Nov 1, 2020 · Big Data

Practical Application of Flink + Kafka in NetEase Cloud Music Real‑Time Computing Platform

This article presents NetEase Cloud Music's real‑time computing platform built on Flink and Kafka, covering background, architectural design, Kafka and Flink selection reasons, platformization, warehouse usage, encountered challenges, and the solutions implemented to improve reliability and performance.

FlinkKafkaReal-time Streaming

0 likes · 11 min read

Practical Application of Flink + Kafka in NetEase Cloud Music Real‑Time Computing Platform

21CTO

Oct 30, 2020 · Big Data

Which Log Collection System Wins? Scribe, Chukwa, Kafka, Flume & ELK Compared

This article reviews the background, requirements, and architectural designs of major open‑source log collection systems—including Facebook’s Scribe, Apache’s Chukwa, LinkedIn’s Kafka, Cloudera’s Flume—and evaluates mature monitoring tools such as ELK, highlighting their features, use cases, advantages, and drawbacks for large‑scale log processing.

Big DataELKFlume

0 likes · 18 min read

Which Log Collection System Wins? Scribe, Chukwa, Kafka, Flume & ELK Compared

Programmer DD

Oct 29, 2020 · Backend Development

Master Kafka Interview Questions: Architecture, Configurations, and Best Practices

This article provides a comprehensive overview of Kafka as a distributed messaging middleware, covering its core concepts, architecture, producer and consumer mechanics, common interview questions, configuration options, high‑availability guarantees, and performance optimizations for backend developers.

ConsumerDistributed MessagingKafka

0 likes · 20 min read

Master Kafka Interview Questions: Architecture, Configurations, and Best Practices

Big Data Technology & Architecture

Oct 29, 2020 · Fundamentals

Zero-Copy Data Transfer Mechanism: Principles, Implementations, and Applications in Java, Kafka, and Spark

This article explains the zero‑copy data transfer technique, compares it with traditional read/write approaches, shows Java NIO code examples, and discusses its use in high‑performance systems such as Kafka and Spark, highlighting the reductions in context switches and memory copies.

Data TransferJava NIOKafka

0 likes · 16 min read

Zero-Copy Data Transfer Mechanism: Principles, Implementations, and Applications in Java, Kafka, and Spark

Efficient Ops

Oct 26, 2020 · Operations

Secure Production ELK Stack with Kafka: Step‑by‑Step Deployment Guide

This guide walks through building a secure, production‑grade logging pipeline by deploying an ELK stack (Elasticsearch, Logstash, Kibana) with X‑Pack security, a Kafka message queue with SASL authentication, and Filebeat agents, covering environment preparation, certificate generation, configuration files, and startup scripts.

DeploymentELKKafka

0 likes · 31 min read

Secure Production ELK Stack with Kafka: Step‑by‑Step Deployment Guide

Architecture Digest

Oct 22, 2020 · Backend Development

Kafka Timing Wheel: Design, Operation, and Code Walkthrough

The article explains how Kafka handles timeout‑based requests using a Timing Wheel data structure, detailing its design, parameters, operation principles, overflow handling, and providing Scala code examples that illustrate O(1) task insertion compared to traditional O(logN) delay queues.

Data StructuresKafkaScala

0 likes · 10 min read

Kafka Timing Wheel: Design, Operation, and Code Walkthrough

Selected Java Interview Questions

Oct 21, 2020 · Backend Development

Message Queue Interview Guide: Why Use MQ, Pros & Cons, and Comparison of Kafka, ActiveMQ, RabbitMQ, and RocketMQ

This article explains why message queues are used, their advantages and disadvantages, compares Kafka, ActiveMQ, RabbitMQ, and RocketMQ, and provides interview-focused guidance on high availability, idempotency, reliability, ordering, and handling of message backlog.

ActiveMQKafkaMQ

0 likes · 63 min read

Message Queue Interview Guide: Why Use MQ, Pros & Cons, and Comparison of Kafka, ActiveMQ, RabbitMQ, and RocketMQ

dbaplus Community

Oct 13, 2020 · Big Data

How to Build a Real‑Time Data Warehouse with Flink: Principles, Architecture, and Best Practices

This article explains why real‑time data warehouses are needed, outlines their core principles, compares them with offline warehouses, describes typical use cases such as real‑time OLAP, dashboards, feature generation and monitoring, and provides a step‑by‑step guide to designing, implementing, and operating a Flink‑based streaming warehouse with Kafka, HBase, and metadata management.

FlinkKafkaOLAP

0 likes · 29 min read

How to Build a Real‑Time Data Warehouse with Flink: Principles, Architecture, and Best Practices

Didi Tech

Oct 12, 2020 · Backend Development

Understanding Kafka's Time Wheel Implementation for Efficient Delayed Task Processing

The article explains how Kafka implements a hierarchical time‑wheel—a ring of bucket sets with a map for O(1) insertion, removal and expiration—to replace costly per‑tick scans, enabling efficient management of millions of delayed tasks across multiple timeout layers.

BackendData StructureKafka

0 likes · 16 min read

Understanding Kafka's Time Wheel Implementation for Efficient Delayed Task Processing

Top Architect

Oct 9, 2020 · Backend Development

Implementing Delayed Queues with Redis and Other Technologies

This article explains how Redis can be used to implement delayed queues, compares its advantages with other solutions such as RabbitMQ, RocketMQ, Kafka, Netty and Java DelayQueue, and provides practical guidance on using sorted sets and timestamps for time‑based task scheduling.

KafkaMessage QueueRabbitMQ

0 likes · 8 min read

Implementing Delayed Queues with Redis and Other Technologies

MaGe Linux Operations

Sep 29, 2020 · Backend Development

Understanding Message Middleware: Core Architecture and Kafka Basics

This article explains the fundamental architecture of message middleware, its key roles such as peak shaving, asynchronous processing and decoupling, the two consumption models (publish‑subscribe and point‑to‑point), and introduces core Kafka concepts with practical Java code examples.

Distributed SystemsKafkaMessage Queue

0 likes · 7 min read

Understanding Message Middleware: Core Architecture and Kafka Basics

IT Architects Alliance

Sep 21, 2020 · Big Data

Guide to Installing and Configuring Alibaba Canal for MySQL Binlog Data Synchronization

This guide provides a step‑by‑step tutorial on downloading, installing, configuring, and starting Alibaba Canal and its adapter to achieve real‑time incremental data synchronization from MySQL binlog to destinations such as Kafka, including code snippets and configuration details.

BinlogCanalDocker

0 likes · 10 min read

Guide to Installing and Configuring Alibaba Canal for MySQL Binlog Data Synchronization

Java Architect Essentials

Sep 21, 2020 · Backend Development

Design and Implementation of a Scalable Long‑Connection Gateway

This article details the architecture, protocol design, permission control, reliability mechanisms, and scaling strategies of a long‑connection gateway built with OpenResty, Kafka, and Redis, illustrating how to share persistent connections across multiple business services while ensuring high performance and fault tolerance.

BackendKafkaMessaging

0 likes · 13 min read

Design and Implementation of a Scalable Long‑Connection Gateway

MaGe Linux Operations

Sep 20, 2020 · Big Data

Mastering Alibaba Canal: Step‑by‑Step Setup for Real‑Time MySQL Binlog Sync

This guide explains what Canal is, its key features and limitations, the underlying binlog replication principle, and provides detailed, step‑by‑step instructions for downloading, configuring, and launching both Canal Server and Canal Adapter to achieve high‑performance real‑time data synchronization.

BinlogCanalDocker

0 likes · 10 min read

Mastering Alibaba Canal: Step‑by‑Step Setup for Real‑Time MySQL Binlog Sync

Big Data Technology & Architecture

Sep 19, 2020 · Big Data

Understanding Kafka Consumer Group Rebalance and Timeout Mechanisms

This article explains how Kafka consumer groups assign partitions, the four situations that trigger a rebalance, the impact of consumer poll timeouts, and practical ways to tune max.poll.interval.ms and max.poll.records to avoid rebalance‑related errors.

Big DataKafkaTimeout

0 likes · 12 min read

Understanding Kafka Consumer Group Rebalance and Timeout Mechanisms

Big Data Technology & Architecture

Sep 18, 2020 · Big Data

Understanding Kafka Consumer Groups, Partition Assignment, and Offset Management

This article explains how Kafka consumer groups accelerate message consumption by distributing partitions across multiple consumers, details the three key characteristics of consumer groups, and provides in‑depth guidance on partition assignment strategies and offset management with practical Java code examples.

Big DataKafkaOffset Management

0 likes · 13 min read

Understanding Kafka Consumer Groups, Partition Assignment, and Offset Management

Architect

Sep 17, 2020 · Big Data

Kafka Exactly-Once Semantics and Transaction API Overview

This article explains Kafka's exactly‑once semantics and transaction support, detailing the new producer API methods, related exceptions, configuration parameters, and a sample application illustrating how to initialize, begin, process, and commit or abort transactions while ensuring idempotent and atomic message handling.

ConfigurationExactly-OnceIdempotence

0 likes · 19 min read

Kafka Exactly-Once Semantics and Transaction API Overview

Big Data Technology & Architecture

Sep 17, 2020 · Big Data

Monitoring Kafka Consumer Groups with kafka-consumer-groups and Kafka Manager

This article explains how to monitor Kafka consumer groups using the built‑in kafka‑consumer‑groups tool and the Kafka Manager UI, providing commands, field explanations, and setup steps to ensure real‑time data availability for downstream services such as MongoDB or Elasticsearch.

Big DataKafkaKafka Manager

0 likes · 4 min read

Monitoring Kafka Consumer Groups with kafka-consumer-groups and Kafka Manager

IT Architects Alliance

Sep 15, 2020 · Backend Development

Step‑by‑Step Guide to Deploying a Multi‑Node Kafka Cluster

This tutorial walks through setting up a four‑node Kafka cluster—including Zookeeper installation, broker configuration, service startup, replication settings, fault handling, and leader election—using Linux commands and detailed code snippets to help readers build a production‑ready streaming platform.

BackendCluster DeploymentKafka

0 likes · 14 min read

Step‑by‑Step Guide to Deploying a Multi‑Node Kafka Cluster

JavaEdge

Sep 15, 2020 · Backend Development

How Kafka Uses ZooKeeper for Metadata Management and Client Coordination

This article explains how Kafka relies on ZooKeeper to store cluster metadata, detailing the ZK node hierarchy, the process by which clients locate brokers, the broker‑side handling of metadata requests, and recommended practices for large‑scale deployments.

Kafkabackend-developmentmetadata

0 likes · 8 min read

How Kafka Uses ZooKeeper for Metadata Management and Client Coordination

dbaplus Community

Sep 14, 2020 · Operations

How iQIYI Scaled Real‑Time Log Monitoring for 100M+ Users with Spark, Flink and Druid

Facing a surge to over 100 million members, iQIYI rebuilt its monitoring stack by ingesting four log types, adopting Spark Streaming, Flink and Druid for real‑time analysis, and optimizing resource usage, which cut incident resolution time by more than 80 % while supporting billion‑level data volumes.

DruidFlinkKafka

0 likes · 12 min read

How iQIYI Scaled Real‑Time Log Monitoring for 100M+ Users with Spark, Flink and Druid

DataFunTalk

Sep 13, 2020 · Big Data

Online Sample Generation with Flink: Architecture and Implementation

This article explains why Flink is chosen for online sample generation, describes the end‑to‑end implementation steps—including stream union, state‑timer processing, and output formatting—covers state backend choices, monitoring, validation, fault handling, and platformization for scalable real‑time machine‑learning pipelines.

FlinkKafkaOnline Sample Generation

0 likes · 11 min read

Online Sample Generation with Flink: Architecture and Implementation

MaGe Linux Operations

Sep 12, 2020 · Big Data

How to Deploy a Multi‑Node Kafka Cluster with Zookeeper from Scratch

This tutorial walks through installing Zookeeper and Kafka, configuring broker IDs, listeners, and replication, starting each service, handling firewall rules, and explains Kafka's replication, failure handling, and leader election mechanisms for a production‑grade cluster.

Cluster DeploymentKafkaReplication

0 likes · 16 min read

How to Deploy a Multi‑Node Kafka Cluster with Zookeeper from Scratch

Architect's Alchemy Furnace

Sep 9, 2020 · Backend Development

Why Kafka’s session.timeout.ms vs heartbeat.interval.ms Matters for Real‑Time Alerts

This article explains how Kafka consumer parameters like session.timeout.ms, heartbeat.interval.ms, and max.poll.interval.ms affect group rebalance, consumer liveness, and real‑time alert processing in high‑availability microservice architectures.

ConsumerDistributed SystemsKafka

0 likes · 15 min read

Why Kafka’s session.timeout.ms vs heartbeat.interval.ms Matters for Real‑Time Alerts

dbaplus Community

Sep 1, 2020 · Big Data

Mastering Real‑Time MySQL Binlog Sync with Debezium, Kafka & Hive

This article presents a systematic guide to real‑time MySQL binlog ingestion, outlining three core principles—decoupling from business data, handling schema changes, and ensuring traceability—followed by concrete Debezium‑Kafka‑Hive solutions, scenario‑specific tactics, and practical tips for reliable data pipelines.

DebeziumKafkadata ingestion

0 likes · 15 min read

Mastering Real‑Time MySQL Binlog Sync with Debezium, Kafka & Hive

DataFunTalk

Sep 1, 2020 · Big Data

NetEase Real-Time Computing Platform (Sloth): Architecture, Practices, and Future Outlook

This article introduces NetEase's real-time computing platform Sloth, detailing its architecture, component layers, integrated IDE, operational tooling, unified metadata management, challenges such as Kudu write amplification, and proposes a tiered real‑time data‑warehouse model with a vision for storage‑compute separation and unified batch‑stream APIs.

Big DataFlinkKafka

0 likes · 13 min read

NetEase Real-Time Computing Platform (Sloth): Architecture, Practices, and Future Outlook

Full-Stack Internet Architecture

Aug 27, 2020 · Fundamentals

An Overview of Message Queues: History, Concepts, and Comparison

This article provides a comprehensive introduction to message queues, covering their historical origins, key standards such as JMS and AMQP, reasons for adoption, advantages and drawbacks, and a comparative analysis of popular implementations like ActiveMQ, RabbitMQ, RocketMQ, and Kafka.

AMQPJMSKafka

0 likes · 9 min read

An Overview of Message Queues: History, Concepts, and Comparison

Youzan Coder

Aug 26, 2020 · Mobile Development

How We Built a Real‑Time Crash Feedback Platform for Mobile Apps

This article details the design and implementation of a comprehensive crash feedback platform for mobile applications, covering the motivation behind replacing third‑party services, the system architecture using Flink, Kafka and HBase, crash interception on Android, automated grouping and assignment, version filtering, daily reporting, and future enhancements.

AndroidFlinkKafka

0 likes · 15 min read

How We Built a Real‑Time Crash Feedback Platform for Mobile Apps

Full-Stack Internet Architecture

Aug 26, 2020 · Backend Development

Interview Experience and Technical Q&A for a Java Backend Position at Tencent Cloud (Xi'an)

The article shares the author's recent move to Xi'an, discusses the local job market, and provides detailed interview questions and answers on Java backend topics such as Redis replication, Kafka performance, MySQL transactions, and JVM garbage collection to help job seekers prepare effectively.

JVMKafkabackend-development

0 likes · 8 min read

Interview Experience and Technical Q&A for a Java Backend Position at Tencent Cloud (Xi'an)

Java Architect Essentials

Aug 25, 2020 · Backend Development

Understanding Kafka: Core Concepts, Architecture, and Performance Secrets

This article explains Kafka's role as a message system, details its fundamental components such as topics, partitions, producers, consumers, and replicas, describes how Zookeeper coordinates the cluster, and explores performance optimizations like sequential writes, zero‑copy, and network design.

Distributed SystemsKafkaMessage Queue

0 likes · 12 min read

Understanding Kafka: Core Concepts, Architecture, and Performance Secrets

DataFunTalk

Aug 25, 2020 · Databases

Real‑time Data Ingestion and Optimization with ClickHouse at ByteDance

This article details ByteDance's engineering practices for using ClickHouse to ingest, store, and query massive real‑time recommendation and advertising data, covering early external‑transaction mechanisms, the risks of direct INSERTs, the design and evaluation of Kafka Engine versus Flink pipelines, and a series of performance and reliability improvements implemented to support high‑frequency workloads.

Database OptimizationKafkaReal-time analytics

0 likes · 20 min read

Real‑time Data Ingestion and Optimization with ClickHouse at ByteDance

Big Data Technology & Architecture

Aug 25, 2020 · Big Data

Understanding Kafka's Segment Storage and Index Design

This article explains how Kafka partitions data into segments, stores each segment as paired index and log files, and uses sparse indexing to enable efficient queries, illustrating the process with examples and diagrams of segment layout and offset lookup.

Big DataKafkaSegment

0 likes · 4 min read

Understanding Kafka's Segment Storage and Index Design

Didi Tech

Aug 24, 2020 · Big Data

Evolution and Architecture of DiDi Data Channel Service

DiDi’s Data Channel Service evolved from a fragmented component system into a unified, SLA‑driven platform with a UI‑based Sync Center and Flink‑powered StreamSQL engine, dramatically improving task creation speed, resource utilization, and reliability while automating issue diagnosis for company‑wide real‑time and offline data synchronization.

Big DataETLFlink

0 likes · 12 min read

Evolution and Architecture of DiDi Data Channel Service

Big Data Technology & Architecture

Aug 23, 2020 · Big Data

Integrating Flink 1.11 with Hive Streaming, Kafka, and Table API

This article demonstrates how to use Flink 1.11's enhanced Hive integration to stream data from a Kafka source, write it into partitioned Hive tables with checkpoint‑driven commits, and read Hive tables as a continuous source using dynamic table options and table hints.

Big DataFlinkKafka

0 likes · 13 min read

Integrating Flink 1.11 with Hive Streaming, Kafka, and Table API

Java Architect Essentials

Aug 21, 2020 · Big Data

Design and Integration of Flume, Kafka, Storm, Drools, and Redis for Real‑Time ETL Log Analysis

This article presents a modular architecture for real‑time ETL log analysis that combines Flume for log collection, Kafka as a buffering layer, Storm for stream processing, Drools for rule‑based data transformation, and Redis for fast storage, detailing installation, configuration, and code integration steps.

Big DataDroolsFlume

0 likes · 23 min read

Design and Integration of Flume, Kafka, Storm, Drools, and Redis for Real‑Time ETL Log Analysis

Architect

Aug 21, 2020 · Backend Development

Message Queue Interview Guide: Benefits, Drawbacks, Choosing the Right MQ, and Ensuring High Availability

This article explains why and when to use message queues, outlines their advantages and disadvantages, compares popular MQ products such as Kafka, RabbitMQ, RocketMQ and ActiveMQ, and provides practical advice on high‑availability, duplicate‑consumption prevention, and idempotent design for interview preparation.

KafkaMQRabbitMQ

0 likes · 19 min read

Message Queue Interview Guide: Benefits, Drawbacks, Choosing the Right MQ, and Ensuring High Availability

vivo Internet Technology

Aug 19, 2020 · Operations

Linux Page Cache Optimization for Kafka: Concepts, Parameter Tuning, and Performance Evaluation

The article explains Linux page cache fundamentals, shows how to inspect and reclaim cache, and provides detailed tuning of vm.dirty_* and vm.swappiness parameters to smooth Kafka write traffic, reduce I/O spikes, and improve overall performance, illustrated with before‑and‑after benchmarks.

IO optimizationKafkaLinux

0 likes · 14 min read

Linux Page Cache Optimization for Kafka: Concepts, Parameter Tuning, and Performance Evaluation

Big Data Technology & Architecture

Aug 18, 2020 · Big Data

End-to-End Real-Time Web Log Processing with Flume, Kafka, Spark Streaming, HBase, and Spring Boot

This tutorial demonstrates how to generate simulated web access logs in Python, schedule them with Crontab, collect them in real time using Flume, forward them to Kafka, process the streams with Spark Streaming, store results in HBase, and visualize the data via a Spring Boot application with ECharts.

Big DataEChartsFlume

0 likes · 36 min read

End-to-End Real-Time Web Log Processing with Flume, Kafka, Spark Streaming, HBase, and Spring Boot

Big Data Technology & Architecture

Aug 15, 2020 · Big Data

Step-by-Step Guide to Building an ELK Stack with Kafka, Zookeeper, Logstash, and Filebeat for Log Collection

This tutorial provides a comprehensive, step-by-step procedure for setting up a log‑collection pipeline using Filebeat, Kafka, Zookeeper, Logstash, Elasticsearch, and Kibana across multiple servers, covering hardware preparation, system tuning, software installation, configuration files, and verification commands.

Big DataELKFilebeat

0 likes · 11 min read

Step-by-Step Guide to Building an ELK Stack with Kafka, Zookeeper, Logstash, and Filebeat for Log Collection

Top Architect

Aug 14, 2020 · Big Data

Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions

This article presents a comprehensive guide for transferring massive MySQL datasets to HBase, covering environment setup on Ubuntu, three synchronization methods—MySQL LOAD DATA, a Kafka‑Thrift pipeline using Maxwell, and real‑time Flink processing—along with performance comparisons and practical tips for Hadoop, HBase, Kafka, Zookeeper, Phoenix, and related tools.

DataSyncFlinkHBase

0 likes · 24 min read

Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions

Programmer DD

Aug 14, 2020 · Big Data

How Kafka Achieves High Throughput: Architecture, GC Tweaks, and Memory Buffer Pools

This article explains how Kafka’s architecture, use of OS page cache, Sendfile optimization, and a custom memory buffer pool work together to minimize JVM garbage collection overhead and deliver the massive throughput required by big‑data messaging workloads.

GC optimizationHigh ThroughputKafka

0 likes · 23 min read

How Kafka Achieves High Throughput: Architecture, GC Tweaks, and Memory Buffer Pools

Architecture Digest

Aug 13, 2020 · Big Data

Synchronizing Billion-Row MySQL Data to HBase: Three Practical Schemes and Implementation Guide

This comprehensive guide details three practical methods for syncing massive MySQL datasets to HBase—including Sqoop, Kafka‑Thrift, and Flink pipelines—covering environment setup, configuration, code examples, performance comparisons, and optimization tips for large‑scale data ingestion and querying.

Big DataFlinkHBase

0 likes · 24 min read

Synchronizing Billion-Row MySQL Data to HBase: Three Practical Schemes and Implementation Guide

Tencent Cloud Middleware

Aug 12, 2020 · Big Data

How Serverless Functions Can Replace Traditional Kafka Data Pipelines for Lower Cost and Easier Scaling

This article explains how Tencent Cloud CKafka works, describes the challenges of traditional open‑source data‑flow solutions, and demonstrates a Serverless Function approach—complete with architecture diagrams and code examples—to achieve low‑cost, auto‑scaling Kafka‑to‑Elasticsearch pipelines.

Big DataCKafkaElasticsearch

0 likes · 12 min read

How Serverless Functions Can Replace Traditional Kafka Data Pipelines for Lower Cost and Easier Scaling

IT Architects Alliance

Aug 12, 2020 · Big Data

Introduction to Confluent KSQL for Real-Time Stream Processing

This article introduces Confluent KSQL, a SQL‑based real‑time stream processing engine for Kafka, covering its architecture, stream vs table concepts, query lifecycle, Docker‑based setup, DDL commands, example joins, windowed aggregations, connectors, and its advantages and limitations.

Big DataDockerKSQL

0 likes · 9 min read

Introduction to Confluent KSQL for Real-Time Stream Processing

Big Data Technology & Architecture

Aug 12, 2020 · Big Data

Real‑time User Behavior Collection Using Flume, Kafka, and Spark Streaming on Hadoop

This guide explains how to continuously collect web‑service user behavior logs, route them through Flume agents to Kafka, and finally ingest them with Spark Streaming into HDFS, covering environment preparation, configuration files, deployment steps, and verification procedures.

Big DataFlumeHadoop

0 likes · 9 min read

Real‑time User Behavior Collection Using Flume, Kafka, and Spark Streaming on Hadoop

Top Architect

Aug 11, 2020 · Big Data

Kafka Basics and Cluster Architecture Overview

This article provides a comprehensive introduction to Kafka, covering its role as a messaging system, core concepts such as topics, partitions, producers, consumers, and messages, and then delves into the cluster architecture including replicas, consumer groups, controller coordination with Zookeeper, performance optimizations, log segmentation, and network design.

Cluster ArchitectureKafkaMessage Queue

0 likes · 11 min read

Kafka Basics and Cluster Architecture Overview

New Oriental Technology

Aug 11, 2020 · Backend Development

Engineering Case Study of New Oriental Cloud Classroom Backend Architecture and Scaling During the Pandemic

The article details how New Oriental's Cloud Classroom backend, built with Java, Spring, MySQL, Redis, Kafka, Sentinel, and other modern technologies, scaled to support millions of users and a hundred‑fold surge in demand during the pandemic through architectural optimizations, distributed caching, traffic control, and rapid performance improvements.

Distributed SystemsKafkajava

0 likes · 7 min read

Engineering Case Study of New Oriental Cloud Classroom Backend Architecture and Scaling During the Pandemic

Big Data Technology & Architecture

Aug 11, 2020 · Big Data

Consuming Kerberos‑Protected Kafka Data with Spark Streaming and Storing into Kudu

This guide demonstrates how to configure a Spark Streaming application running on YARN in cluster mode to securely consume Kerberos‑protected Kafka topics and write the processed data into Kudu tables, including necessary Java code, Kerberos keytab setup, Kafka client configuration, and spark‑submit commands.

Big DataKafkaKerberos

0 likes · 11 min read

Consuming Kerberos‑Protected Kafka Data with Spark Streaming and Storing into Kudu

Open Source Linux

Aug 11, 2020 · Backend Development

Build a Docker‑Based Kafka Cluster and Integrate It with Spring Boot

This guide walks you through creating a three‑node Kafka cluster with Zookeeper using Docker‑Compose, configuring the necessary YAML, launching the containers, and then integrating the cluster into a Spring Boot application by adding dependencies, setting Kafka properties, defining message, sender, and receiver classes, and testing the message flow.

DockerDocker ComposeKafka

0 likes · 6 min read

Build a Docker‑Based Kafka Cluster and Integrate It with Spring Boot

Big Data Technology & Architecture

Aug 10, 2020 · Big Data

Real-time Hot Item, PV, and UV Statistics Using Apache Flink, Kafka, and Bloom Filter

This article demonstrates how to implement real-time hot item ranking, page view counting, and unique visitor estimation using Apache Flink with Kafka sources, sliding windows, custom aggregation functions, and a Bloom filter backed by Redis, providing complete Scala code examples.

Big DataFlinkKafka

0 likes · 15 min read

Real-time Hot Item, PV, and UV Statistics Using Apache Flink, Kafka, and Bloom Filter

IT Architects Alliance

Aug 10, 2020 · Operations

Step‑by‑Step Guide to Building a Filebeat‑Kafka‑ELK Logging Pipeline

This tutorial walks through installing and configuring Filebeat, Kafka, Logstash, Elasticsearch, and Kibana, detailing version requirements, file permissions, YAML settings, startup commands, topic verification, and how to ingest and visualize log data in Kibana.

ELKElasticsearchFilebeat

0 likes · 13 min read

Step‑by‑Step Guide to Building a Filebeat‑Kafka‑ELK Logging Pipeline

JavaEdge

Aug 9, 2020 · Fundamentals

When Does Data Compression Boost System Performance? A Deep Dive into Kafka and RocketMQ

This article explains the significance of data compression, outlines when it should be applied, compares lossless algorithms, discusses segment selection, and details how Kafka and RocketMQ implement message compression to improve throughput while balancing CPU, storage, and network resources.

KafkaMessage QueueRocketMQ

0 likes · 9 min read

When Does Data Compression Boost System Performance? A Deep Dive into Kafka and RocketMQ

MaGe Linux Operations

Aug 6, 2020 · Backend Development

Build a Docker‑Based Kafka Cluster and Integrate It with Spring Boot

This guide shows how to set up a Docker‑Compose Kafka cluster with Zookeeper, run it, and integrate the cluster into a Spring Boot application using Spring Kafka, including required dependencies, configuration, and sample producer‑consumer code, plus testing via a REST endpoint.

DockerKafkaSpring Boot

0 likes · 6 min read

Big Data Technology & Architecture

Aug 6, 2020 · Big Data

Flink Configuration Parameters and Related Tuning for Kafka and Yarn

This article provides a comprehensive guide to configuring Apache Flink—including job manager and task manager settings, high‑availability via Zookeeper, metrics reporting, as well as Kafka producer tuning and Yarn resource adjustments—to help practitioners optimize big‑data streaming jobs.

Big DataConfigurationFlink

0 likes · 8 min read

Flink Configuration Parameters and Related Tuning for Kafka and Yarn

Efficient Ops

Aug 4, 2020 · Operations

Mastering Filebeat: How to Collect and Ship Container Logs to Kafka

This article introduces Filebeat as a lightweight log shipper, explains its core components and processing flow, and provides step‑by‑step configuration examples for gathering container logs and forwarding them to Kafka or Elasticsearch in cloud‑native environments.

ElasticsearchFilebeatGo

0 likes · 13 min read

Mastering Filebeat: How to Collect and Ship Container Logs to Kafka

Architects Research Society

Aug 4, 2020 · Backend Development

Apache Kafka vs RabbitMQ: Architecture, Pull vs Push, Performance, and Best Use Cases

This article compares Apache Kafka and RabbitMQ, detailing their architectural differences, message handling models (pull vs push), performance characteristics, and ideal use cases, helping readers choose the appropriate messaging system for streaming, high‑throughput, or legacy protocol scenarios.

KafkaMessagingRabbitMQ

0 likes · 10 min read

Apache Kafka vs RabbitMQ: Architecture, Pull vs Push, Performance, and Best Use Cases

Big Data Technology & Architecture

Aug 4, 2020 · Big Data

Manual Kafka Offset Management in Spark Streaming using createDirectStream (Java & Scala)

This article explains how to use Spark Streaming's Direct Approach with Kafka, manually manage offsets, and provides complete Java and Scala implementations—including a JavaKafkaManager class, a demo application, and a Scala KafkaManager—illustrating the creation of DirectKafkaInputDStream, offset handling, and integration with Spark.

KafkaOffset ManagementScala

0 likes · 14 min read

Manual Kafka Offset Management in Spark Streaming using createDirectStream (Java & Scala)