Tagged articles
1273 articles
Page 5 of 13
Architects Research Society
Architects Research Society
Apr 25, 2023 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

This article explains the purpose, semantics, and design of Apache Kafka's transaction API, describes how exactly‑once processing is achieved in stream‑processing applications, outlines the Java client usage, and discusses the internal components, performance considerations, and best‑practice tips for developers.

Distributed SystemsExactly-OnceKafka
0 likes · 16 min read
Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance
Tencent Cloud Middleware
Tencent Cloud Middleware
Apr 25, 2023 · Big Data

How to Achieve High Availability for Kafka Across Data Centers: Architectures, Trade‑offs, and Solutions

This article explains Kafka's cross‑data‑center high‑availability options, compares stretched and connected cluster designs, outlines typical failure scenarios, and reviews both community and commercial replication solutions, helping architects choose the most suitable deployment for their specific requirements.

Connected ClusterCross‑Data‑CenterKafka
0 likes · 24 min read
How to Achieve High Availability for Kafka Across Data Centers: Architectures, Trade‑offs, and Solutions
Architecture Digest
Architecture Digest
Apr 23, 2023 · Backend Development

Kafka Core Concepts, Architecture, Performance Optimizations, and Production Deployment Guide

This article provides a comprehensive technical overview of Kafka, covering its core message‑queue value, architecture components such as producers, consumers, topics, partitions and replication, high‑performance mechanisms like zero‑copy and OS cache, resource planning for disks, memory, CPU and network, operational tools and commands, consumer‑group management, rebalance strategies, and internal scheduling mechanisms such as the time‑wheel.

Backend ArchitectureDistributed SystemsKafka
0 likes · 30 min read
Kafka Core Concepts, Architecture, Performance Optimizations, and Production Deployment Guide
Qunar Tech Salon
Qunar Tech Salon
Apr 19, 2023 · Operations

Heimdall Exception Statistics System: Architecture, Implementation, and Practice

This article describes the design, implementation, and evolution of Heimdall, an exception‑statistics platform built on Kafka, Flink, and HBase that provides minute‑level anomaly aggregation, stack trace querying, and integration with release and alerting workflows to improve service reliability across thousands of micro‑services.

Exception MonitoringKafkalog aggregation
0 likes · 14 min read
Heimdall Exception Statistics System: Architecture, Implementation, and Practice
Tencent Cloud Middleware
Tencent Cloud Middleware
Apr 13, 2023 · Fundamentals

RocketMQ, Kafka, Pulsar: Core Concepts, Architecture & Transactional Messaging

This article provides a comprehensive overview of major message‑queue middleware—including RocketMQ, Kafka, Pulsar, and RabbitMQ—covering fundamental concepts such as tags, groups, offsets, architectural components, storage mechanisms, transaction workflows, rebalance strategies, and recent developments, while comparing their features and performance characteristics.

KafkaMessage QueuePulsar
0 likes · 19 min read
RocketMQ, Kafka, Pulsar: Core Concepts, Architecture & Transactional Messaging
政采云技术
政采云技术
Apr 13, 2023 · Backend Development

Understanding the AKF Scale Cube: X, Y, Z Axes for System Scalability and Their Application to Kafka and Redis

The article explains the AKF Scale Cube model—horizontal replication (X axis), functional decomposition (Y axis), and data/service partitioning (Z axis)—and demonstrates how these three scaling dimensions can be applied to backend systems such as Kafka and Redis to achieve high availability, performance, and fault isolation.

AKF Scale CubeDistributed SystemsKafka
0 likes · 19 min read
Understanding the AKF Scale Cube: X, Y, Z Axes for System Scalability and Their Application to Kafka and Redis
Code Ape Tech Column
Code Ape Tech Column
Apr 11, 2023 · Backend Development

Comprehensive Comparison of Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ

This article provides a detailed side‑by‑side comparison of five popular message‑queue systems—Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ—covering documentation, programming languages, supported protocols, storage, transactions, load balancing, clustering, management UI, availability, duplication handling, throughput, subscription models, ordering, acknowledgments, replay, retry, concurrency, and includes promotional information from the author.

ActiveMQComparisonKafka
0 likes · 24 min read
Comprehensive Comparison of Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ
High Availability Architecture
High Availability Architecture
Apr 10, 2023 · Cloud Computing

Serverless Architecture for Homework Photo Processing at Hangzhou Mingshitang Using Alibaba Cloud Function Compute

This case study describes how Hangzhou Mingshitang transformed its homework photo evaluation system from a Kubernetes‑based solution to an Alibaba Cloud Function Compute serverless architecture, achieving higher elasticity, lower latency, and reduced costs while handling peak traffic of over one million daily images.

Education TechnologyFunction ComputeKafka
0 likes · 11 min read
Serverless Architecture for Homework Photo Processing at Hangzhou Mingshitang Using Alibaba Cloud Function Compute
HomeTech
HomeTech
Apr 5, 2023 · Backend Development

Design and Implementation of a Real‑Time Cache Update System Based on Kafka and Distributed Cache

This article presents a comprehensive design and implementation of a real‑time cache update system that leverages Kafka‑driven database change streams, a centralized cache scheduling center, executor registration, broadcast and fail‑over scheduling, and a lightweight SDK to achieve millisecond‑level cache consistency for C‑end services.

BackendCacheDistributed Systems
0 likes · 10 min read
Design and Implementation of a Real‑Time Cache Update System Based on Kafka and Distributed Cache
Tencent Cloud Middleware
Tencent Cloud Middleware
Apr 4, 2023 · Backend Development

Why Kafka’s High Reliability and Performance Matter for Asynchronous Decoupling and Load Smoothing

This article explains Kafka’s core concepts, architecture, and the mechanisms—such as ACK policies, replication, HW/LEO management, zero‑copy I/O, batching, compression, and load‑balancing—that together ensure high reliability and high throughput for asynchronous decoupling and peak‑shaving scenarios.

Kafkaasynchronous decouplinghigh performance
0 likes · 33 min read
Why Kafka’s High Reliability and Performance Matter for Asynchronous Decoupling and Load Smoothing
dbaplus Community
dbaplus Community
Apr 3, 2023 · Operations

How to Guarantee Zero Message Loss in Kafka: Practical Detection and Prevention Strategies

This article explains why MQ middleware like Kafka is introduced for system decoupling and traffic control, outlines the three key challenges of message loss detection, loss points, and prevention, and provides detailed configurations, monitoring tips, and code examples to ensure reliable, loss‑free message delivery.

ConfigurationConsumerData Consistency
0 likes · 12 min read
How to Guarantee Zero Message Loss in Kafka: Practical Detection and Prevention Strategies
Top Architect
Top Architect
Mar 30, 2023 · Backend Development

Understanding Kafka Idempotent Producer and How to Prevent Message Duplicates

This article explains why message duplication occurs in Kafka, describes the three delivery semantics, and provides practical solutions—including idempotent producers, transactions, and consumer-side idempotence—along with configuration tips and code examples to achieve exactly‑once delivery.

ConfigurationIdempotenceKafka
0 likes · 11 min read
Understanding Kafka Idempotent Producer and How to Prevent Message Duplicates
Code Ape Tech Column
Code Ape Tech Column
Mar 30, 2023 · Backend Development

How to Ensure No Message Loss in MQ Systems – Interview Guide and Practical Solutions

This article explains the common interview question of guaranteeing 100% message reliability in MQ middleware such as Kafka or RabbitMQ, outlines the three lifecycle stages of a message, discusses detection mechanisms, id generation, idempotent consumption, and handling message backlog, providing concrete design patterns and practical examples.

Distributed SystemsIdempotencyKafka
0 likes · 12 min read
How to Ensure No Message Loss in MQ Systems – Interview Guide and Practical Solutions
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 29, 2023 · Backend Development

How ByteHouse Achieves High‑Availability Real‑Time Data Ingestion with HaKafka

ByteHouse evolved its real‑time import pipeline from a community ClickHouse architecture to a custom HaKafka engine and a cloud‑native design, addressing node failures, read‑write conflicts, scaling costs, and latency by introducing two‑level concurrency, memory tables, exactly‑once semantics, and robust fault‑tolerance.

Distributed SystemsKafkaReal-time Ingestion
0 likes · 15 min read
How ByteHouse Achieves High‑Availability Real‑Time Data Ingestion with HaKafka
DataFunTalk
DataFunTalk
Mar 29, 2023 · Big Data

Evolution of ByteHouse Real‑Time Ingestion: From Internal Demands to a Cloud‑Native Architecture

This article details the motivation, architectural evolution, and technical implementations of ByteHouse's real‑time ingestion pipeline, covering internal business requirements, distributed‑system challenges, the custom HaKafka engine, memory‑table optimizations, and the transition to a cloud‑native design that delivers high availability, low‑latency, and exactly‑once semantics.

ByteHouseKafkaReal-time Ingestion
0 likes · 13 min read
Evolution of ByteHouse Real‑Time Ingestion: From Internal Demands to a Cloud‑Native Architecture
Top Architect
Top Architect
Mar 27, 2023 · Big Data

Kafka Architecture, Performance Optimization, and Production Deployment Guide

This article provides a comprehensive overview of Kafka’s core concepts, high‑performance design, cluster planning, resource evaluation, deployment steps, producer and consumer configurations, fault‑tolerance mechanisms, and operational tools, offering practical guidance for building and managing a high‑throughput Kafka production environment.

Cluster DeploymentConsumerDistributed Systems
0 likes · 31 min read
Kafka Architecture, Performance Optimization, and Production Deployment Guide
Java High-Performance Architecture
Java High-Performance Architecture
Mar 24, 2023 · Backend Development

Explore Echo: Open-Source Java Community Platform & Deployment Guide

Echo is a full‑stack open‑source Java community system built with Spring Boot, MyBatis, MySQL, Redis, Kafka and Elasticsearch, offering modules like posts, comments and notifications, and the article provides its core tech stack, development environment, local setup steps, deployment architecture, demo screenshots and source code access.

ElasticsearchKafkaSpring Boot
0 likes · 5 min read
Explore Echo: Open-Source Java Community Platform & Deployment Guide
dbaplus Community
dbaplus Community
Mar 15, 2023 · Backend Development

How to Prevent Message Loss in Kafka: Practical Tips and Configurations

This guide explains why message queues are introduced for decoupling and traffic control, identifies three key areas where message loss can occur—in producers, brokers, and consumers—and provides concrete Kafka configurations, monitoring practices, and operational steps to ensure reliable, loss‑free message delivery.

Consumer MonitoringKafkaMessage Loss
0 likes · 12 min read
How to Prevent Message Loss in Kafka: Practical Tips and Configurations
Architects Research Society
Architects Research Society
Mar 15, 2023 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Considerations

This article explains why exactly‑once semantics are needed for stream‑processing applications, describes Kafka's transactional model and semantics, details the Java transaction API and its usage, and discusses the internal components, performance trade‑offs, and practical guidelines for building reliable Kafka‑based pipelines.

Distributed SystemsExactly-OnceKafka
0 likes · 17 min read
Understanding Transactions in Apache Kafka: Semantics, API, and Practical Considerations
dbaplus Community
dbaplus Community
Mar 7, 2023 · Operations

How We Rescued a ClickHouse Logging Cluster After Zookeeper‑Induced Read‑Only Failure

A production logging system became unavailable due to Kafka backlog alerts, prompting an investigation that uncovered read‑only ClickHouse tables caused by mismatched Zookeeper metadata after a TTL policy change, leading to a step‑by‑step recovery involving Zookeeper restarts, metadata fixes, and table reconstruction.

Cluster RecoveryFlinkKafka
0 likes · 9 min read
How We Rescued a ClickHouse Logging Cluster After Zookeeper‑Induced Read‑Only Failure
ShiZhen AI
ShiZhen AI
Mar 1, 2023 · Cloud Native

Why We Chose Kafka for Our Open‑Source Real‑Time Streaming Platform

The article explains how market trends, data‑driven enterprise needs, and internal platform experience led Didi to build Know Streaming—a zero‑intrusion, plugin‑based real‑time streaming solution built on Kafka—to address scalability, operability, and community adoption challenges.

Cloud NativeData PlatformKafka
0 likes · 12 min read
Why We Chose Kafka for Our Open‑Source Real‑Time Streaming Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 1, 2023 · Big Data

How We Built a Scalable Real‑Time Data Architecture for a Complex Supply Chain

This article describes the challenges of a highly complex supply‑chain system, the evolution from early MySQL‑based reporting to a modern real‑time data platform using Flink, Kafka, ClickHouse, Hologres and other cloud services, and the tools and lessons learned to achieve low‑latency, high‑throughput analytics.

FlinkKafkaStreaming
0 likes · 11 min read
How We Built a Scalable Real‑Time Data Architecture for a Complex Supply Chain
Su San Talks Tech
Su San Talks Tech
Feb 24, 2023 · Backend Development

Why We’re Dropping RabbitMQ for Kafka: A Complete Migration Blueprint

Facing chaotic usage, maintenance challenges, partition tolerance issues, and performance bottlenecks with RabbitMQ, our middleware team decided to fully migrate to Kafka, outlining reasons, comparative models, migration strategies, and verification steps to ensure a smooth, high‑availability, high‑performance transition.

BackendKafkaMessage Queue
0 likes · 13 min read
Why We’re Dropping RabbitMQ for Kafka: A Complete Migration Blueprint
Architecture Digest
Architecture Digest
Feb 9, 2023 · Big Data

Understanding Kafka Messages, Topics, Partitions, and Consumers

This article explains Kafka's core concepts—including messages as byte arrays, optional keys for partition control, topic and partition organization, producer and consumer roles, offsets, consumer groups, and broker clusters—providing a concise technical overview for developers learning Kafka.

ConsumerKafkaMessage
0 likes · 6 min read
Understanding Kafka Messages, Topics, Partitions, and Consumers
Architect's Journey
Architect's Journey
Feb 6, 2023 · Backend Development

Simplify Kafka Consumer Code with a Custom Method Argument Resolver

The article demonstrates how to use Spring's HandlerMethodArgumentResolver to automatically convert Kafka messages into typed method parameters, eliminating repetitive JSON parsing code and showing a complete implementation, usage example, and notes on version support and performance considerations.

HandlerMethodArgumentResolverKafkaMessage Conversion
0 likes · 6 min read
Simplify Kafka Consumer Code with a Custom Method Argument Resolver
dbaplus Community
dbaplus Community
Jan 14, 2023 · Backend Development

How to Minimize Data Movement When Scaling Kafka Replicas

This article explores strategies for batch scaling Kafka replicas with minimal data migration, presenting two design ideas, detailed calculations of broker lists, partition counts, start indexes, and replica shifts, and provides step‑by‑step algorithms and code snippets to compute optimal replica assignments for both expansion and contraction scenarios.

BackendKafkaPartition Assignment
0 likes · 15 min read
How to Minimize Data Movement When Scaling Kafka Replicas
vivo Internet Technology
vivo Internet Technology
Jan 11, 2023 · Cloud Native

Practices of Distributed Message Middleware at vivo: From RocketMQ to Kafka and Pulsar

vivo’s Internet Storage team details how it operates RocketMQ for low‑latency online services and Kafka for massive big‑data pipelines, outlines resource isolation, traffic balancing, intelligent throttling, and governance practices, and describes its migration from RabbitMQ and planned shift from Kafka to cloud‑native Pulsar.

Big DataCloud NativeKafka
0 likes · 22 min read
Practices of Distributed Message Middleware at vivo: From RocketMQ to Kafka and Pulsar
Efficient Ops
Efficient Ops
Jan 10, 2023 · Big Data

Why a Single Kafka Broker Failure Can Halt All Consumers – Deep Dive into HA

This article explains Kafka's multi‑replica design, ISR mechanism, leader election rules, and producer acknowledgment settings, then shows how the built‑in __consumer_offset topic with a single replica can cause a whole cluster to become unavailable when one broker crashes, and offers practical fixes.

Consumer OffsetsISRKafka
0 likes · 9 min read
Why a Single Kafka Broker Failure Can Halt All Consumers – Deep Dive into HA
Top Architect
Top Architect
Jan 2, 2023 · Big Data

Optimizing Kafka at Meituan: Challenges and Solutions for a Large‑Scale Data Platform

This article details Meituan's use of Kafka as a unified data cache and distribution layer, outlines the challenges of massive scale and latency, and presents comprehensive optimizations across application, system, and cluster management layers, including disk balancing, migration acceleration, fetcher isolation, and full‑link monitoring.

Big DataDistributed SystemsKafka
0 likes · 22 min read
Optimizing Kafka at Meituan: Challenges and Solutions for a Large‑Scale Data Platform
MaGe Linux Operations
MaGe Linux Operations
Dec 30, 2022 · Big Data

Mastering Kafka: Core Concepts, Architecture, and Performance Optimizations

This comprehensive guide explores Kafka as a distributed messaging middleware, detailing its core concepts, architecture, producer and consumer mechanisms, configuration options, Zookeeper integration, controller responsibilities, network model, performance optimizations such as zero‑copy, page‑cache usage, batching, compression, and partition concurrency.

Distributed MessagingKafkaZooKeeper
0 likes · 41 min read
Mastering Kafka: Core Concepts, Architecture, and Performance Optimizations
Data Thinking Notes
Data Thinking Notes
Dec 23, 2022 · Big Data

How Real-Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices

This article explains why real‑time data warehouses are becoming essential, outlines their goals, compares them with traditional offline warehouses, and presents detailed design patterns, naming conventions, and case studies from Didi, Kuaishou, Tencent, Youzan and other enterprises, highlighting challenges and solutions for streaming, storage, and query layers.

Big Data ArchitectureData LakeETL
0 likes · 49 min read
How Real-Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices
Architecture Digest
Architecture Digest
Dec 2, 2022 · Big Data

Design and Implementation of Vivo's Bees Log Collection Agent

This article presents the design principles, core techniques, and practical solutions of Vivo's self‑developed Bees log collection agent, covering file discovery, unique identification, real‑time and offline ingestion, checkpointing, resource control, platform management, and a comparison with open‑source alternatives.

Agent DesignKafkaResource Management
0 likes · 25 min read
Design and Implementation of Vivo's Bees Log Collection Agent
High Availability Architecture
High Availability Architecture
Nov 30, 2022 · Big Data

Design and Implementation of Vivo's Bees Log Collection Agent

This article presents the design principles, core features, and implementation details of Vivo's self‑developed Bees log collection agent, covering file discovery, unique identification, real‑time and offline ingestion, resource control, platform management, and comparisons with open‑source solutions.

HDFSKafkaResource Management
0 likes · 22 min read
Design and Implementation of Vivo's Bees Log Collection Agent
Architecture Digest
Architecture Digest
Nov 30, 2022 · Backend Development

Meituan Kafka at Scale: Challenges and Optimizations for Latency, Cluster Management, and Reliability

This article details Meituan's large‑scale Kafka deployment—over 15,000 machines and petabyte‑level daily traffic—its operational challenges such as slow nodes, load imbalance, and resource contention, and the comprehensive read/write latency, system‑level, and cluster‑management optimizations implemented to improve performance and reliability.

Cluster ManagementDistributed SystemsKafka
0 likes · 22 min read
Meituan Kafka at Scale: Challenges and Optimizations for Latency, Cluster Management, and Reliability
Top Architect
Top Architect
Nov 26, 2022 · Backend Development

Comprehensive Overview of RabbitMQ, RocketMQ, and Kafka: Architecture, Features, and Best Practices

This article provides an in-depth comparison of RabbitMQ, RocketMQ, and Kafka, detailing their core components, exchange types, message durability, acknowledgment mechanisms, TTL, dead‑letter queues, load balancing, ordering, transaction handling, high‑availability configurations, and practical solutions for common messaging challenges.

Distributed SystemsKafkaRabbitMQ
0 likes · 33 min read
Comprehensive Overview of RabbitMQ, RocketMQ, and Kafka: Architecture, Features, and Best Practices
Tencent Cloud Developer
Tencent Cloud Developer
Nov 24, 2022 · Backend Development

Kafka Stability Best Practices: Prevention, Monitoring, and Fault Resolution

This guide outlines Kafka stability best practices across three phases—pre‑prevention with tuning, producer/consumer guidelines, and cluster configuration; runtime monitoring using white‑box and black‑box metrics and alerts; and fault resolution strategies for backlogs, consumption blocks, and message loss, plus cost control and idempotence techniques.

Distributed MessagingKafkabackend-development
0 likes · 29 min read
Kafka Stability Best Practices: Prevention, Monitoring, and Fault Resolution
vivo Internet Technology
vivo Internet Technology
Nov 23, 2022 · Big Data

Design and Implementation of Vivo's Bees Log Collection Agent

Vivo’s Bees‑agent is a custom, lightweight log‑collection service that discovers rotating files via inotify, uniquely identifies them with inode and hash signatures, supports real‑time and offline ingestion to Kafka and HDFS, offers checkpoint‑resume, resource isolation, rich metrics, and a centralized management platform, outperforming open‑source collectors in latency, memory usage, and scalability.

Agent DesignHDFSKafka
0 likes · 24 min read
Design and Implementation of Vivo's Bees Log Collection Agent
21CTO
21CTO
Nov 20, 2022 · Big Data

How Meituan’s Logan Real‑Time Log System Boosts Debugging Across Mobile, Web, and IoT

This article details the design, architecture, and implementation of Meituan's Logan real‑time logging platform, covering its workflow, multi‑terminal collection SDK, ingestion, Flink‑based processing, consumption layers, stability measures, and future roadmap, illustrating how it improves fault diagnosis and system reliability.

ElasticsearchFlinkKafka
0 likes · 18 min read
How Meituan’s Logan Real‑Time Log System Boosts Debugging Across Mobile, Web, and IoT
Liulishuo Tech Team
Liulishuo Tech Team
Nov 17, 2022 · Big Data

Real‑time Data Warehouse Architecture and Technical Solution at Liulishuo

This article describes Liulishuo's migration to a Flink‑based real‑time data warehouse, covering background, benefits, technology selection (storage, Flink platform, dimension table connectors), overall architecture, concrete Hudi and Elasticsearch ingestion examples, processing SQL, and future outlook for unified batch‑streaming storage.

ElasticsearchFlinkHudi
0 likes · 15 min read
Real‑time Data Warehouse Architecture and Technical Solution at Liulishuo
ShiZhen AI
ShiZhen AI
Nov 16, 2022 · Big Data

Inside Kafka Consumer SyncGroupRequest: How Rebalance Works

The article walks through the complete lifecycle of a Kafka consumer SyncGroupRequest, detailing request headers and bodies, coordinator selection, state handling on the GroupCoordinator, metadata persistence, and the client‑side response processing that transitions members to a stable state.

Consumer RebalanceGroupCoordinatorKafka
0 likes · 17 min read
Inside Kafka Consumer SyncGroupRequest: How Rebalance Works
macrozheng
macrozheng
Nov 15, 2022 · Backend Development

From ActiveMQ to RocketMQ: My Journey Through Message Queues and Lessons Learned

This article chronicles the author's four‑stage evolution with message queues—from early experiments with ActiveMQ, through Redis and RabbitMQ, to MetaQ and finally RocketMQ—highlighting practical challenges, architectural decisions, performance tuning, and insights for building robust, high‑throughput backend systems.

ActiveMQKafkaMessage Queue
0 likes · 28 min read
From ActiveMQ to RocketMQ: My Journey Through Message Queues and Lessons Learned
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 14, 2022 · Big Data

Kafka Consumer Group Rebalance: Mechanisms, Strategies, Protocols, and Java Implementation

This article provides a comprehensive overview of Kafka consumer group rebalance, covering version compatibility, rebalance triggers, assignment strategies, generation handling, protocol details, the full rebalance workflow, listener usage, and complete Java code examples for offset management with database integration.

ConsumerGroupKafkabigdata
0 likes · 19 min read
Kafka Consumer Group Rebalance: Mechanisms, Strategies, Protocols, and Java Implementation
Java Architect Essentials
Java Architect Essentials
Nov 11, 2022 · Big Data

Meituan Kafka at Scale: Challenges and Optimizations for Latency, Cluster Management, and Reliability

This article details Meituan's large‑scale Kafka deployment, describing the current state, performance challenges such as slow nodes and disk imbalance, and the comprehensive optimizations applied—including read/write latency reductions, migration pipelines, fetcher isolation, SSD caching, RAID acceleration, cgroup isolation, full‑link monitoring, service lifecycle management, and TOR disaster recovery—to improve reliability and prepare for future growth.

Cluster ManagementKafkaLatency Reduction
0 likes · 21 min read
Meituan Kafka at Scale: Challenges and Optimizations for Latency, Cluster Management, and Reliability
ShiZhen AI
ShiZhen AI
Nov 10, 2022 · Backend Development

Query Kafka Messages with Know Streaming: A Step‑by‑Step Guide

This article explains how to use Know Streaming’s zero‑intrusion plugin to query Kafka messages, detailing the UI flow, multi‑dimensional filter options such as offset, partition, and key/value selection, the data truncation feature, and the underlying implementation that builds a KafkaConsumer and leverages KafkaAdminClient.

KafkaKafkaAdminClientKafkaConsumer
0 likes · 2 min read
Query Kafka Messages with Know Streaming: A Step‑by‑Step Guide
High Availability Architecture
High Availability Architecture
Nov 7, 2022 · Backend Development

Design and Implementation of Meituan's Logan Real-Time Log System

This article describes how Meituan built Logan, a high‑performance, end‑to‑end real‑time logging platform for mobile, web, mini‑programs and IoT, covering its background, architecture, data collection, processing, consumption, monitoring, deployment strategies, achieved results and future roadmap.

Backend ArchitectureElasticsearchFlink
0 likes · 15 min read
Design and Implementation of Meituan's Logan Real-Time Log System
Java High-Performance Architecture
Java High-Performance Architecture
Nov 5, 2022 · Big Data

Why Can Kafka Process 20 Million Messages per Second? Inside Its High‑Performance Architecture

This article explains how Kafka achieves extremely high throughput—up to 20 million messages and 600 MB per second per node—by optimizing the producer, broker, and consumer components through batch sending, custom protocols, page‑cache usage, zero‑copy transfers, and efficient compression algorithms.

BrokerConsumerKafka
0 likes · 7 min read
Why Can Kafka Process 20 Million Messages per Second? Inside Its High‑Performance Architecture
dbaplus Community
dbaplus Community
Nov 3, 2022 · Big Data

Why Kafka Stores Data the Way It Does: A Deep Dive into Its Log Architecture

This article thoroughly examines Kafka's storage system, explaining why it uses sequential log writes combined with sparse indexing, how different log formats evolved, and the mechanisms for log retention and compaction that enable high‑throughput, fault‑tolerant streaming at massive scale.

Big DataDistributed SystemsKafka
0 likes · 22 min read
Why Kafka Stores Data the Way It Does: A Deep Dive into Its Log Architecture
Meituan Technology Team
Meituan Technology Team
Nov 3, 2022 · Backend Development

Design and Implementation of Logan Real-Time Log System at Meituan

The article details Meituan’s end‑to‑end design and implementation of Logan, a high‑performance real‑time logging service for mobile apps, web, mini‑programs and IoT, covering background challenges, architecture layers, technology choices such as Flink and Elasticsearch, stability measures, deployment practices, achieved results and future plans.

Blue‑Green deploymentElasticsearchFlink
0 likes · 21 min read
Design and Implementation of Logan Real-Time Log System at Meituan
IT Services Circle
IT Services Circle
Nov 2, 2022 · Backend Development

Implementing Gray (Canary) Messaging for RabbitMQ and Kafka

This article describes how to design and implement a gray (canary) messaging capability for RabbitMQ and Kafka, covering background, gray scenarios, two consumption strategies, and detailed production and consumption flows with code snippets for header tagging, requeue handling, and consumer group management.

KafkaMessage QueueRabbitMQ
0 likes · 8 min read
Implementing Gray (Canary) Messaging for RabbitMQ and Kafka
ShiZhen AI
ShiZhen AI
Nov 2, 2022 · Operations

How to Quickly Scale Kafka Topic Replicas with Know Streaming

This guide explains how Know Streaming adds a non‑native Kafka feature that lets users batch‑scale replicas for one or multiple topics, customize target brokers, preview and edit the reassignment plan, and throttle the operation to minimize impact on the cluster.

Distributed SystemsKafkaKafka Operations
0 likes · 5 min read
How to Quickly Scale Kafka Topic Replicas with Know Streaming
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Oct 27, 2022 · Backend Development

Understanding Kafka: Architecture, Principles, Features, and Use Cases

This article explains Kafka's distributed publish‑subscribe architecture, detailing its core components, underlying mechanisms with Zookeeper coordination, key features such as high throughput and fault tolerance, and common application scenarios like log collection, user activity tracking, and stream processing.

Backend ArchitectureDistributed MessagingKafka
0 likes · 5 min read
Understanding Kafka: Architecture, Principles, Features, and Use Cases
360 Tech Engineering
360 Tech Engineering
Oct 26, 2022 · Backend Development

Lightweight Intelligent Monitoring Platform Architecture and Component Overview

This article details a lightweight intelligent monitoring platform built on the open‑source WVP framework, describing its modular architecture, edge‑computing workflow with KubeEdge, SIP registration process, real‑time streaming setup, core features, and technical innovations such as MongoDB adoption and flexible pod scheduling.

AIEdge ComputingKafka
0 likes · 9 min read
Lightweight Intelligent Monitoring Platform Architecture and Component Overview
IT Services Circle
IT Services Circle
Oct 26, 2022 · Databases

Debezium: Open‑Source Change Data Capture Platform – Overview, Architecture, Use Cases, and Installation Guide

This article introduces Debezium, an open‑source low‑latency change data capture platform that streams database row changes via Kafka, explains its architecture and common scenarios such as cache invalidation and CQRS, and provides step‑by‑step Docker commands to install ZooKeeper, Kafka, MySQL and the Debezium connector.

CDCData IntegrationDebezium
0 likes · 15 min read
Debezium: Open‑Source Change Data Capture Platform – Overview, Architecture, Use Cases, and Installation Guide
ShiZhen AI
ShiZhen AI
Oct 25, 2022 · Operations

How to Diagnose Unexpected Errors When Adding a New Kafka Consumer Group

When starting a new Kafka consumer group, an unexpected SyncGroup error occurs due to a RecordTooLargeException, and the article walks through log inspection, identifies the oversized __consumer_offsets record, and resolves the issue by increasing the message.max.bytes configuration.

KafkaRecordTooLargeExceptionSyncGroup
0 likes · 5 min read
How to Diagnose Unexpected Errors When Adding a New Kafka Consumer Group
Selected Java Interview Questions
Selected Java Interview Questions
Oct 23, 2022 · Big Data

Building a Cost‑Effective Data Analysis Platform: ClickHouse vs Elasticsearch and Deployment Guide for Zookeeper, Kafka, Filebeat, and ClickHouse

This article compares Elasticsearch and ClickHouse for log analytics, presents cost‑benefit calculations, and provides a step‑by‑step deployment guide for Zookeeper, Kafka, Filebeat, and ClickHouse to build a scalable, low‑cost data analysis platform for SaaS services.

Big DataDeploymentElasticsearch
0 likes · 12 min read
Building a Cost‑Effective Data Analysis Platform: ClickHouse vs Elasticsearch and Deployment Guide for Zookeeper, Kafka, Filebeat, and ClickHouse
DataFunTalk
DataFunTalk
Oct 23, 2022 · Backend Development

Design and Implementation of a Lightweight Asynchronous Message Processing Framework for Data Catalog

This article describes the motivation, requirements, design, and implementation of a lightweight asynchronous message processing framework built by ByteDance to handle near‑real‑time metadata changes for DataLeap's Data Catalog, detailing its architecture, thread model, state management, delay handling, monitoring, and operational experiences.

Backend FrameworkKafkaMessage Queue
0 likes · 11 min read
Design and Implementation of a Lightweight Asynchronous Message Processing Framework for Data Catalog
Architect's Guide
Architect's Guide
Oct 22, 2022 · Big Data

Meituan’s Kafka Optimizations: Reducing Read/Write Latency and Managing Large‑Scale Clusters

This article describes how Meituan’s data platform tackles the growing challenges of a 15,000‑plus‑node Kafka deployment by detailing current bottlenecks, latency‑reduction techniques across application and system layers, large‑scale cluster management strategies, and future directions for robustness and cloud‑native migration.

Big DataKafkaLarge-Scale Clusters
0 likes · 21 min read
Meituan’s Kafka Optimizations: Reducing Read/Write Latency and Managing Large‑Scale Clusters
ShiZhen AI
ShiZhen AI
Oct 19, 2022 · Big Data

Deep Dive into Kafka Consumer JoinGroupRequest Flow

This article walks through the complete Kafka consumer JoinGroupRequest lifecycle, detailing how the client builds and sends the request, how the group coordinator selects the coordinator node, processes unknown and known members, elects a leader, chooses a partition assignment protocol, and transitions group states.

ConsumerGroup CoordinationJoinGroupRequest
0 likes · 26 min read
Deep Dive into Kafka Consumer JoinGroupRequest Flow
Java High-Performance Architecture
Java High-Performance Architecture
Oct 14, 2022 · Backend Development

How to Guarantee Zero Message Loss in MQ Systems – Interview Mastery

Interviewers frequently probe candidates on ensuring 100% message reliability in MQ systems like Kafka or RabbitMQ, and this guide walks through the underlying concepts, potential loss points, detection mechanisms, idempotent design, handling backlog, and practical ID generation strategies to ace such questions.

IdempotencyInterview PreparationKafka
0 likes · 13 min read
How to Guarantee Zero Message Loss in MQ Systems – Interview Mastery
ShiZhen AI
ShiZhen AI
Oct 11, 2022 · Backend Development

How Kafka Consumer Clients Send and Manage Heartbeat Requests

The article walks through the Kafka consumer heartbeat lifecycle, detailing how the HeartbeatThread is started, paused, and used to send heartbeat and LeaveGroup requests, how the GroupCoordinator validates and processes these requests, the client’s response handling, and the resulting state transitions illustrated with diagrams.

ConsumerGroupCoordinatorHeartbeat
0 likes · 14 min read
How Kafka Consumer Clients Send and Manage Heartbeat Requests
IT Architects Alliance
IT Architects Alliance
Oct 9, 2022 · Backend Development

Event‑Driven Messaging Patterns at Wix: Consumption, Projection, End‑to‑End Streaming, In‑Memory KV Stores, Scheduling, Transactions, and Aggregation

The article describes how Wix engineers built a robust, Kafka‑based event‑driven messaging infrastructure for over 1,400 microservices, detailing patterns such as consumption and projection, end‑to‑end streaming with websockets, in‑memory KV stores, schedule‑and‑forget jobs, exactly‑once transactions, and event aggregation to achieve scalability, resilience, and low‑latency data access.

Data StreamingDistributed SystemsEvent-Driven Architecture
0 likes · 16 min read
Event‑Driven Messaging Patterns at Wix: Consumption, Projection, End‑to‑End Streaming, In‑Memory KV Stores, Scheduling, Transactions, and Aggregation
dbaplus Community
dbaplus Community
Oct 9, 2022 · Operations

How Ping An Health Scaled SkyWalking to Billions of Traces: A Full‑Link Monitoring Journey

This article recounts the end‑to‑end design, implementation, and iterative optimization of a billion‑scale full‑link tracing system at Ping An Health using SkyWalking, covering why full‑link monitoring is needed, the selection of SkyWalking, architecture choices, performance bottlenecks, and the roadmap for future enhancements.

APMElasticsearchFull‑Link Tracing
0 likes · 21 min read
How Ping An Health Scaled SkyWalking to Billions of Traces: A Full‑Link Monitoring Journey
Top Architect
Top Architect
Oct 2, 2022 · Big Data

Optimizing Kafka at Meituan: Challenges and Solutions for Large‑Scale Cluster Management

This article details Meituan's Kafka deployment, describing the current massive scale and associated challenges, and presents a series of optimizations—including read/write latency reductions, application‑ and system‑level improvements, large‑scale cluster management strategies, full‑link monitoring, service lifecycle management, and future directions—to enhance performance, reliability, and scalability of the streaming platform.

KafkaMeituanbig-data
0 likes · 23 min read
Optimizing Kafka at Meituan: Challenges and Solutions for Large‑Scale Cluster Management
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Sep 29, 2022 · Backend Development

Scaling Event‑Driven Messaging at Wix with Kafka: Key Patterns

This article explains how Wix uses Kafka‑based event‑driven messaging to decouple microservices, improve scalability, and achieve exactly‑once processing through patterns such as consume‑and‑project, end‑to‑end event streams, in‑memory KV stores, scheduled jobs, transactional events, and event aggregation.

Data StreamingDistributed SystemsEvent-Driven Architecture
0 likes · 16 min read
Scaling Event‑Driven Messaging at Wix with Kafka: Key Patterns
Shopee Tech Team
Shopee Tech Team
Sep 28, 2022 · Backend Development

Shopee Off-Platform Ads Delay Service: Architecture and Implementation

Shopee’s off‑platform ads delay service combines Redis Zsets for expiration tracking, HBase for payload storage, and Kafka for queuing to reliably process up to 6 million tasks per minute with minute‑level delays ranging from one minute to thirty days, achieving horizontal scalability, fault tolerance, and a 75 % reduction in Kubernetes resource usage.

KafkaMarketing Automationarchitecture
0 likes · 17 min read
Shopee Off-Platform Ads Delay Service: Architecture and Implementation
ShiZhen AI
ShiZhen AI
Sep 27, 2022 · Big Data

What Is a Kafka Consumer Group Coordinator?

The article explains the role of Kafka's consumer group coordinator and consumer coordinator, details how group coordinators are selected, and walks through the JoinGroup, SyncGroup, LeaveGroup, and heartbeat processes, as well as partition assignment strategies and common Q&A.

GroupCoordinatorJoinGroupKafka
0 likes · 11 min read
What Is a Kafka Consumer Group Coordinator?
Tencent Cloud Developer
Tencent Cloud Developer
Sep 26, 2022 · Big Data

Kafka Architecture Overview and Core Concepts

Kafka’s architecture consists of brokers forming clusters, producers publishing to topics split into partitions with replicas, consumers organized in groups pulling messages by offset, ZooKeeper managing metadata, and log‑based storage using append‑only files, indexes, and zero‑copy, while configurable acknowledgment, batching, and replication ensure high throughput and fault‑tolerant reliability.

ConsumerKafkaProducer
0 likes · 18 min read
Kafka Architecture Overview and Core Concepts
Tencent Cloud Developer
Tencent Cloud Developer
Sep 20, 2022 · Information Security

Data Classification and Grading Architecture for Enterprise Data Security

The article details a practical, reusable enterprise architecture for data classification and grading that combines scanning tools, a rule‑engine with hot‑updates, a high‑performance identification service, and a security enforcement platform, addressing massive real‑time data volumes, diverse storage types, cross‑department isolation, and compliance with China’s data security laws.

Big DataCloud NativeKafka
0 likes · 14 min read
Data Classification and Grading Architecture for Enterprise Data Security
Top Architect
Top Architect
Sep 17, 2022 · Big Data

Meituan's Kafka Architecture: Challenges and Optimizations at Massive Scale

This article details how Meituan's Kafka platform, serving over 15,000 machines and handling petabytes of daily traffic, faces read/write latency, slow nodes, and large‑scale cluster management challenges, and describes a series of application‑layer, system‑layer, and operational optimizations—including disk balancing, migration pipelines, fetcher isolation, consumer async, SSD caching, isolation strategies, full‑link monitoring, lifecycle management, and TOR disaster recovery—to improve performance and reliability.

KafkaMeituanStreaming
0 likes · 22 min read
Meituan's Kafka Architecture: Challenges and Optimizations at Massive Scale
Architect
Architect
Sep 15, 2022 · Big Data

Meituan's Kafka Optimizations: Challenges, Latency Improvements, and Large‑Scale Cluster Management

This article describes how Meituan's massive Kafka deployment—over 15,000 machines and petabytes of daily traffic—faces scalability challenges such as slow nodes, load imbalance, and resource contention, and details the multi‑layer optimizations applied at the application, system, and cluster‑management levels to reduce read/write latency and improve reliability.

KafkaLatencybig-data
0 likes · 22 min read
Meituan's Kafka Optimizations: Challenges, Latency Improvements, and Large‑Scale Cluster Management
Programmer DD
Programmer DD
Sep 9, 2022 · Big Data

Why Kafka and Pulsar Lead the Distributed Streaming Landscape

This article introduces Apache Kafka and Apache Pulsar, compares their core features such as publish/subscribe messaging, storage, real‑time pipelines, and stream processing, outlines key characteristics like high throughput, scalability and fault tolerance, and explains fundamental concepts and architecture components unique to each platform.

Big DataDistributed StreamingKafka
0 likes · 14 min read
Why Kafka and Pulsar Lead the Distributed Streaming Landscape
IT Architects Alliance
IT Architects Alliance
Sep 6, 2022 · Operations

How to Guarantee Zero Message Loss with Kafka: Best Practices and Configurations

This article explains why introducing a message queue like Kafka helps decouple systems and control traffic, then dives into the three key questions of detecting, locating, and preventing message loss, offering concrete monitoring methods, configuration settings, and troubleshooting steps for producers, brokers, and consumers.

BrokerConfigurationConsumer
0 likes · 13 min read
How to Guarantee Zero Message Loss with Kafka: Best Practices and Configurations
IT Xianyu
IT Xianyu
Sep 6, 2022 · Backend Development

Dynamic Flow Orchestration with Nacos, Docker, and SpringBoot Microservices

This article demonstrates how to build a lightweight, plug‑and‑play flow‑orchestration system for microservices by installing Nacos with Docker, configuring SpringBoot services with Kafka and Nacos, and using dynamic Nacos listeners to adjust Kafka topics at runtime without redeployment.

DockerFlow OrchestrationKafka
0 likes · 9 min read
Dynamic Flow Orchestration with Nacos, Docker, and SpringBoot Microservices
Bilibili Tech
Bilibili Tech
Sep 6, 2022 · Big Data

Lancer: Evolution of Bilibili's Real-Time Streaming Architecture

Lancer, Bilibili’s real‑time streaming backbone, has evolved from a monolithic Flume pipeline to a log‑id‑isolated, Kubernetes‑native architecture where Go edge agents feed synchronous Kafka‑proxied gateways into per‑logid topics processed by dedicated Flink‑SQL jobs, delivering exactly‑once, back‑pressured, highly scalable data ingestion for billions of daily requests.

Big DataFlinkKafka
0 likes · 29 min read
Lancer: Evolution of Bilibili's Real-Time Streaming Architecture
dbaplus Community
dbaplus Community
Sep 3, 2022 · Backend Development

Can Redis Streams Replace Kafka for Your Messaging Needs?

The article explains how Redis Streams offers a lightweight, memory‑based alternative to Kafka, detailing its features, consumer‑group model, performance advantages, and suitable use cases while acknowledging scenarios where a full‑featured message queue remains preferable.

BackendKafkaMessage Queue
0 likes · 7 min read
Can Redis Streams Replace Kafka for Your Messaging Needs?
Wukong Talks Architecture
Wukong Talks Architecture
Sep 2, 2022 · Big Data

Preventing Data Loss in Kafka: Message Semantics, Failure Scenarios, and Reliability Solutions

This article explains Kafka's message delivery semantics, analyzes potential data‑loss scenarios across producer, broker, and consumer components, and provides concrete configuration and coding practices—such as idempotent producers, proper ACK settings, replication factors, and manual offset commits—to maximize message durability and reliability.

BrokerConsumerData loss
0 likes · 18 min read
Preventing Data Loss in Kafka: Message Semantics, Failure Scenarios, and Reliability Solutions
Huolala Tech
Huolala Tech
Sep 1, 2022 · Big Data

How HuoLala Built a Real‑Time Metrics Monitoring Platform for Flink

This article explains how HuoLala’s real‑time R&D platform redesigns Flink metric collection, routing, and alerting using a custom Kafka‑based pipeline, flexible dashboards, and multi‑level metric governance to improve observability, reduce latency, and ensure data quality.

FlinkKafkaReal-Time
0 likes · 22 min read
How HuoLala Built a Real‑Time Metrics Monitoring Platform for Flink
NiuNiu MaTe
NiuNiu MaTe
Sep 1, 2022 · Backend Development

How to Choose the Right Message Queue: Scenarios, Features, and Comparison

This article explains what message queues are, outlines key application scenarios such as asynchronous processing, message distribution, and traffic shaping, compares popular solutions like ActiveMQ, RabbitMQ, RocketMQ, and Kafka across performance and reliability dimensions, and provides practical guidance for selecting the most suitable queue for different business needs.

KafkaMessage QueueRabbitMQ
0 likes · 8 min read
How to Choose the Right Message Queue: Scenarios, Features, and Comparison
IT Architects Alliance
IT Architects Alliance
Aug 30, 2022 · Big Data

Understanding Kafka: Architecture, Topics, Partitions, Producers, Consumers, Offsets, Transactions, and Configuration

This article provides a comprehensive overview of Apache Kafka, explaining its distributed message‑queue architecture, the role of topics and partitions, producer and consumer workflows, leader election, offset management, consumer‑group rebalancing, delivery semantics, transaction processing, file organization, and key configuration settings.

Big DataDistributed MessagingKafka
0 likes · 17 min read
Understanding Kafka: Architecture, Topics, Partitions, Producers, Consumers, Offsets, Transactions, and Configuration
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Aug 28, 2022 · Backend Development

Why Is Kafka So Fast? Uncover the 4 Core Performance Secrets

This article explains the four key techniques—page‑cache usage, sequential disk writes, zero‑copy transfers, and partitioned segment indexing—that enable Kafka to achieve exceptionally high write performance, detailing how each mechanism reduces latency and maximizes throughput.

KafkaPartitioningSequential Write
0 likes · 5 min read
Why Is Kafka So Fast? Uncover the 4 Core Performance Secrets