Tagged articles

1273 articles

Page 10 of 13

Aug 3, 2020 · Backend Development

Mastering Kafka Producer API: Tips, Configurations, and Common Pitfalls

This article provides a comprehensive guide to Kafka's producer API, covering core concepts, client‑side workflow, essential configurations, idempotent and transactional producers, and practical Java code examples to help developers avoid common pitfalls and optimize message publishing.

Distributed SystemsIdempotent ProducerKafka

0 likes · 21 min read

Mastering Kafka Producer API: Tips, Configurations, and Common Pitfalls

JavaEdge

Aug 1, 2020 · Backend Development

How to Choose the Right Message Queue: RabbitMQ vs RocketMQ vs Kafka

This guide outlines key criteria for selecting a message queue—open source, ecosystem, reliability, clustering, and performance—and compares RabbitMQ, RocketMQ, and Kafka, highlighting each system's strengths, weaknesses, and ideal use‑cases.

KafkaRabbitMQRocketMQ

0 likes · 10 min read

How to Choose the Right Message Queue: RabbitMQ vs RocketMQ vs Kafka

Full-Stack Internet Architecture

Jul 31, 2020 · Backend Development

Why Kafka Is Fast: Partition Parallelism, Sequential Disk Writes, Page Cache, Zero‑Copy, Batching and Compression

The article explains how Kafka achieves high throughput by using partition‑level parallelism, sequential disk writes with segment files, extensive use of the OS page cache, zero‑copy data paths, request batching and optional compression, while also discussing the underlying disk I/O principles.

BackendKafkaPartitioning

0 likes · 14 min read

Why Kafka Is Fast: Partition Parallelism, Sequential Disk Writes, Page Cache, Zero‑Copy, Batching and Compression

Top Architect

Jul 30, 2020 · Backend Development

RabbitMQ vs Apache Kafka: Architectural Differences, Pros & Cons, and How to Choose

This article compares RabbitMQ and Apache Kafka, explaining their internal mechanisms, key differences in ordering, routing, timing, retention, fault tolerance, scalability, and consumer complexity, and provides guidance on when to choose each technology for modern micro‑service architectures.

ComparisonKafkaMessage Queue

0 likes · 23 min read

RabbitMQ vs Apache Kafka: Architectural Differences, Pros & Cons, and How to Choose

Tencent Cloud Developer

Jul 29, 2020 · Big Data

Case Study: Optimizing Tencent Cloud Elasticsearch for High‑Volume Game Log Analytics

To handle a gaming company's million‑QPS log stream, the team built a hot‑cold Tencent Cloud Elasticsearch cluster with ILM‑driven tiering, scaled CPU/heap, reduced shard count via shrink and replica tweaks, tuned Logstash‑Kafka pipelines, and employed COS snapshots and searchable snapshots, achieving stable performance and lower cost.

Big DataElasticsearchILM

0 likes · 29 min read

Case Study: Optimizing Tencent Cloud Elasticsearch for High‑Volume Game Log Analytics

Efficient Ops

Jul 26, 2020 · Operations

Build a Billion-Scale ELK Logging Platform with Filebeat, Kafka, Elasticsearch

This guide walks through the complete design and step‑by‑step deployment of a billion‑scale ELK logging platform, covering architecture, component roles, version selection, configuration files, and command‑line installation for Filebeat, Kafka, Logstash, Elasticsearch, and Kibana.

ELKElasticsearchFilebeat

0 likes · 12 min read

Build a Billion-Scale ELK Logging Platform with Filebeat, Kafka, Elasticsearch

Big Data Technology & Architecture

Jul 25, 2020 · Big Data

Kafka Transactions, Replication Issues, HW/LEO Evolution, and Reliability Mechanisms

This article explains how Kafka implements transactions, handles under‑replicated partitions, manages high‑watermark and log‑end‑offset evolution, uses leader epochs for consistency, discusses read‑committed isolation, explains why read‑write separation is not supported, and describes delay queues, dead‑letter/retry queues, auditing, tracing, lag calculation, key metrics, and performance‑optimising design features.

DelayQueueHighWatermarkKafka

0 likes · 25 min read

Kafka Transactions, Replication Issues, HW/LEO Evolution, and Reliability Mechanisms

Big Data Technology & Architecture

Jul 24, 2020 · Big Data

Key Concepts and Internal Mechanisms of Apache Kafka

This article provides an in‑depth overview of Apache Kafka’s internal topics, preferred replicas, partition allocation mechanisms, log directory structure, index files, offset and timestamp lookup, log retention and compaction policies, storage architecture, delayed operations, controller role, consumer rebalance process, and producer idempotence.

Consumer RebalanceDistributed SystemsIdempotence

0 likes · 18 min read

Key Concepts and Internal Mechanisms of Apache Kafka

Java Captain

Jul 24, 2020 · Operations

Enterprise Log Monitoring System Architecture and Implementation

To address the challenges of managing logs across hundreds of microservices in production, the article presents an enterprise log monitoring solution that centralizes collection via Filebeat, processes logs with Kafka Streams, visualizes data using Grafana and Kibana, and integrates Elastic APM for tracing and performance metrics.

ELKKafkaLog Monitoring

0 likes · 8 min read

Enterprise Log Monitoring System Architecture and Implementation

Big Data Technology & Architecture

Jul 23, 2020 · Big Data

Comprehensive Kafka FAQ: Uses, Architecture, Offsets, and Partition Management

This article provides an extensive overview of Apache Kafka, covering its use cases, key concepts such as ISR, AR, HW, LEO, and LW, message ordering, the roles of partitioners, serializers and interceptors, producer and consumer client architecture, offset handling, multithreaded consumption, and topic partition management.

Big DataKafkaMessage Queue

0 likes · 16 min read

Comprehensive Kafka FAQ: Uses, Architecture, Offsets, and Partition Management

Big Data Technology & Architecture

Jul 22, 2020 · Big Data

Ensuring Message Reliability, Idempotence, and Transactions in Kafka

The article explains Kafka's reliability mechanisms, detailing how committed messages are persisted, common producer and consumer data‑loss scenarios, best‑practice configurations for acks, retries, replication, and offset handling, and describes idempotent and transactional producer setups for atomic writes.

Distributed SystemsIdempotenceKafka

0 likes · 7 min read

Ensuring Message Reliability, Idempotence, and Transactions in Kafka

Big Data Technology & Architecture

Jul 22, 2020 · Big Data

Kafka Architecture and Core Concepts: Producers, Brokers, and Consumers

This article explains Kafka's fundamental architecture, including the roles of producers, brokers, and consumers, key concepts such as topics, partitions, replicas, ISR, and controller, as well as detailed mechanisms of producer client structure, interceptors, serializers, partitioners, and consumer group rebalancing strategies.

Big DataDistributed SystemsKafka

0 likes · 22 min read

Kafka Architecture and Core Concepts: Producers, Brokers, and Consumers

Programmer DD

Jul 22, 2020 · Big Data

How to Sync Billions of MySQL Records to HBase: 3 Powerful Methods Using Hadoop, Kafka, and Flink

This comprehensive guide walks you through setting up a pseudo‑distributed Hadoop environment, loading massive MySQL data with LOAD DATA, Python scripts, and multithreading, and then synchronizing the data to HBase using three approaches—Sqoop, a Kafka‑Thrift pipeline, and a real‑time Kafka‑Flink pipeline—while also comparing query performance of HBase and Phoenix.

FlinkHBaseKafka

0 likes · 28 min read

How to Sync Billions of MySQL Records to HBase: 3 Powerful Methods Using Hadoop, Kafka, and Flink

Big Data Technology & Architecture

Jul 20, 2020 · Big Data

Kafka Workflow and File Storage Mechanism: Topics, Partitions, Segments, Index and Log Files

This article explains Kafka’s workflow, detailing how topics, partitions, and segments are organized, the structure of index and log files, message composition, offset-based retrieval, and the overall data directory layout, providing a comprehensive overview of Kafka’s storage architecture.

Big DataKafkaOFFSET

0 likes · 8 min read

Kafka Workflow and File Storage Mechanism: Topics, Partitions, Segments, Index and Log Files

Tencent Cloud Developer

Jul 20, 2020 · Cloud Native

Tencent Eagle Eye Distributed Logging System Cloud Migration Practice

Tencent’s Eagle Eye distributed real‑time monitoring and log analysis platform was migrated to the cloud by rebuilding its LogSender and Kafka‑to‑ES components, switching to cloud CKafka and Elasticsearch, which boosted throughput fourfold, cut resource usage by about half, saved roughly 20 million RMB annually, and set the stage for further enhancements such as comprehensive monitoring and exactly‑once delivery.

ElasticsearchKafkaTencent

0 likes · 9 min read

Tencent Eagle Eye Distributed Logging System Cloud Migration Practice

Architects Research Society

Jul 19, 2020 · Backend Development

Comparing Kafka and Mosquitto for Microservice Communication

This article examines the challenges of microservice communication, explains why REST APIs are unsuitable, and compares two messaging broker solutions—Kafka and Mosquitto—highlighting their architectures, persistence, scalability, and suitability for high‑traffic, reliable event‑driven systems.

BackendEvent-drivenKafka

0 likes · 6 min read

Comparing Kafka and Mosquitto for Microservice Communication

360 Tech Engineering

Jul 17, 2020 · Big Data

Qbus Service Overview: Architecture, Use Cases, and Implementation Details

This article introduces Qbus, a cloud‑based queue service built on Kafka, covering its architecture, core components such as log collection, SDKs, HDFS persistence, monitoring with Prometheus, business integration methods, use‑case scenarios, and future development directions.

Cloud QueueHDFSKafka

0 likes · 6 min read

Qbus Service Overview: Architecture, Use Cases, and Implementation Details

Big Data Technology & Architecture

Jul 16, 2020 · Big Data

Understanding Kafka Replication: ISR, HW, LEO, and Acknowledgement Mechanisms

This article explains Kafka's replication process, covering producer acknowledgments, replica synchronization strategies, the roles of ISR and AR, the meanings of HW, LEO, LSO, LW, different ack levels, and how failures are handled to balance reliability and performance.

AcknowledgementDistributed SystemsHW

0 likes · 9 min read

Understanding Kafka Replication: ISR, HW, LEO, and Acknowledgement Mechanisms

Architects Research Society

Jul 16, 2020 · Big Data

Differences Between MQTT and Kafka: Protocol Design, Use Cases, and Integration

The article explains how MQTT, a lightweight IoT messaging protocol, and Kafka, a distributed streaming platform, differ in architecture, purpose, and design goals despite both using publish/subscribe, and discusses their complementary integration via bridges such as EMQ X.

IoTKafkaMQTT

0 likes · 5 min read

Differences Between MQTT and Kafka: Protocol Design, Use Cases, and Integration

Big Data Technology & Architecture

Jul 15, 2020 · Big Data

Root Causes and Solutions for Kafka Duplicate Consumption

This article analyzes the common causes of Kafka duplicate consumption, such as uncommitted offsets due to forced thread termination, auto‑commit settings, session timeouts, rebalancing, and slow processing, and provides practical solutions including disabling auto‑commit, adjusting consumer configurations, and using new consumer groups.

Consumer ConfigurationDuplicate ConsumptionKafka

0 likes · 7 min read

Root Causes and Solutions for Kafka Duplicate Consumption

Architects Research Society

Jul 15, 2020 · Big Data

Introduction to Apache Kafka: A Distributed Streaming Platform

This article provides a comprehensive overview of Apache Kafka, explaining its distributed, fault‑tolerant architecture, horizontal scalability, disk‑based commit log, replication mechanisms, Streams API, KSQL, and why it is widely adopted as the backbone of event‑driven, high‑throughput systems.

Distributed SystemsKafkaMessage Queue

0 likes · 15 min read

Introduction to Apache Kafka: A Distributed Streaming Platform

Selected Java Interview Questions

Jul 10, 2020 · Backend Development

Message Queue Interview Questions and Technical Guide

This article provides a comprehensive overview of message queue concepts, covering usage scenarios, advantages, drawbacks, technology selection, high‑availability architectures, duplicate handling, data loss prevention, ordering guarantees, latency management, and design principles, supplemented with interview‑style questions and code examples.

KafkaMQMessage Queue

0 likes · 20 min read

Message Queue Interview Questions and Technical Guide

Selected Java Interview Questions

Jul 8, 2020 · Backend Development

Message Queue Applications and Comparison of Common MQs (ActiveMQ, RabbitMQ, RocketMQ, Kafka)

This article explains the role of message queues in distributed systems, illustrates four typical scenarios—asynchronous processing, application decoupling, traffic shaping, and message communication—and compares the features of popular middleware such as ActiveMQ, RabbitMQ, RocketMQ, and Kafka.

KafkaMessage QueueRabbitMQ

0 likes · 9 min read

Message Queue Applications and Comparison of Common MQs (ActiveMQ, RabbitMQ, RocketMQ, Kafka)

Big Data Technology & Architecture

Jul 2, 2020 · Big Data

KSQL Quick Start: Deploying and Querying Kafka Data with Streaming SQL

This article introduces KSQL as a lightweight streaming SQL engine for Apache Kafka, explains its architecture and core concepts of streams and tables, and provides step‑by‑step deployment instructions, command‑line examples for creating streams/tables, querying data, and managing persistent queries.

Apache KafkaBig DataKSQL

0 likes · 10 min read

KSQL Quick Start: Deploying and Querying Kafka Data with Streaming SQL

Big Data Technology Architecture

Jun 29, 2020 · Fundamentals

Kafka Storage Mechanism and Reliability Guarantees

This article explains Kafka's internal storage architecture—including topics, partitions, segments, .log and .index files—how data is read, and the various reliability mechanisms such as ISR/OSR, LEO/HW, producer acknowledgment levels, leader election strategies, and delivery semantics.

KafkaProducer AcksReliability

0 likes · 9 min read

Kafka Storage Mechanism and Reliability Guarantees

Full-Stack Internet Architecture

Jun 23, 2020 · Backend Development

Common Kafka Interview Questions and Answers

This article reviews common Kafka interview questions, covering delay queues, idempotence, replica states, offsets, message ordering, and handling duplicate consumption, and includes example code for enabling idempotent producers along with explanations of time‑wheel mechanisms and practical solutions to consumer rebalance issues.

ConsumerIdempotenceKafka

0 likes · 9 min read

Common Kafka Interview Questions and Answers

Suning Technology

Jun 19, 2020 · Big Data

How Suning’s Big Data Engine Powered a Record‑Breaking 618 Sale

Suning’s 618 shopping festival showcased a massive sales surge backed by its big‑data platform, which processed over 200 billion requests, handled 38.5 PB of daily data, and delivered 31.5 trillion computations, while Kafka and HBase sustained tens of millions of TPS to ensure a seamless consumer experience.

618 SaleHBaseKafka

0 likes · 5 min read

How Suning’s Big Data Engine Powered a Record‑Breaking 618 Sale

High Availability Architecture

Jun 19, 2020 · Backend Development

Design and Implementation of a Traffic Replay System for Bilibili Membership Purchase Service

This article describes how Bilibili's Membership Purchase team built a Java‑based traffic replay platform using JVM‑Sandbox AOP, Kafka, and MySQL to capture real‑world request/response data, serialize it with JSON, and replay it for comprehensive regression testing of complex backend services.

KafkaMicroservicesSpring Cloud

0 likes · 14 min read

Design and Implementation of a Traffic Replay System for Bilibili Membership Purchase Service

Architect

Jun 18, 2020 · Backend Development

Applying Message Queues for Decoupling in E‑commerce Architecture

The article explains why and how to use message queues to achieve low‑coupling, better performance, fault tolerance, and eventual consistency in an e‑commerce order‑processing flow, discusses common pitfalls such as message loss and duplication, and compares popular queue products like RabbitMQ, Kafka, and RocketMQ.

Backend ArchitectureDecouplingDistributed Systems

0 likes · 10 min read

Applying Message Queues for Decoupling in E‑commerce Architecture

Full-Stack Internet Architecture

Jun 18, 2020 · Big Data

Kafka Interview Questions: High Availability, Reliability, Consistency, Performance, and Usage Rationale

This article explains common Kafka interview questions by analyzing the system's high‑availability design, reliability mechanisms, consistency model, performance tricks such as sequential writes and zero‑copy, and the reasons for using Kafka and message queues, providing both conceptual insight and practical details.

ConsistencyDistributed SystemsKafka

0 likes · 12 min read

Kafka Interview Questions: High Availability, Reliability, Consistency, Performance, and Usage Rationale

Ops Development Stories

Jun 18, 2020 · Operations

Forward Zabbix Alerts to WeChat via Kafka – Complete Step‑by‑Step Guide

This guide shows how to route Zabbix alarm messages through a Kafka cluster and then deliver them to Enterprise WeChat using Python scripts, covering host configuration, Kafka/Zookeeper startup, topic creation, alert‑sending scripts, and Zabbix action setup.

AlertingEnterprise WeChatKafka

0 likes · 6 min read

Forward Zabbix Alerts to WeChat via Kafka – Complete Step‑by‑Step Guide

Java Backend Technology

Jun 16, 2020 · Big Data

How Kafka’s Architecture and Memory Pool Reduce JVM GC for High Throughput

This article explains how Kafka’s design—its broker architecture, use of sequential disk I/O, PageCache, Sendfile, and a custom memory buffer pool—optimizes JVM garbage collection and achieves massive throughput in big‑data messaging scenarios.

Big DataGC optimizationHigh Throughput

0 likes · 21 min read

How Kafka’s Architecture and Memory Pool Reduce JVM GC for High Throughput

Big Data Technology & Architecture

Jun 13, 2020 · Big Data

Achieving Exactly-Once Semantics in Kafka and Spark Streaming

This article explains the three message delivery semantics in distributed stream processing, compares Kafka‑Spark Streaming integration methods (receiver vs direct stream), and details how to achieve exactly‑once guarantees through idempotent or transactional writes, including code examples.

Big DataExactly-OnceKafka

0 likes · 8 min read

Achieving Exactly-Once Semantics in Kafka and Spark Streaming

58 Tech

Jun 10, 2020 · Big Data

Real‑time Data Warehouse Practices at 58 Tongcheng Bao: From Spark Streaming 1.0 to Flink‑based 2.0

This article details the evolution of 58 Tongcheng Bao's real‑time data warehouse, describing the initial Spark‑Streaming architecture, its limitations, and the redesign using Flink with a layered ODS‑DWD‑DWS‑APP model, data‑quality monitoring, join techniques, and the resulting improvements in latency and accuracy.

Big DataData QualityFlink

0 likes · 9 min read

Real‑time Data Warehouse Practices at 58 Tongcheng Bao: From Spark Streaming 1.0 to Flink‑based 2.0

Full-Stack Internet Architecture

Jun 10, 2020 · Backend Development

Understanding Kafka Consumer Groups, Repartition Triggers, and Consumption Guarantees

This article explains the relationship between Kafka consumers and consumer groups, when repartition occurs, how consumers interact with Zookeeper, the overall consumer workflow, and the three delivery semantics (at‑least‑once, at‑most‑once, exactly‑once) in a concise, technical overview.

BackendKafkaMessage Queue

0 likes · 8 min read

Understanding Kafka Consumer Groups, Repartition Triggers, and Consumption Guarantees

Big Data Technology & Architecture

Jun 4, 2020 · Big Data

Kafka for Data Ingestion and Event Distribution: Production‑Consumer and Publish‑Subscribe Patterns

This article explains how Kafka can be used for data ingestion and event distribution by illustrating production‑consumer and publish‑subscribe models, describing core concepts such as topics, partitions and consumer groups, and offering practical design options for handling different event scenarios.

Big DataEvent DistributionKafka

0 likes · 9 min read

Programmer DD

Jun 4, 2020 · Backend Development

How I Traced a Sudden Data Drop After a Feature Release: 14 Debugging Steps

After a new feature caused a sharp decline in data volume, I walked through a fourteen‑step troubleshooting process—verifying the issue, inspecting code, consulting DBAs, testing locally, checking configurations, logging, packet capture, load testing, and finally identifying a Kafka partition bottleneck—to restore normal operation.

Kafkatroubleshooting

0 likes · 9 min read

How I Traced a Sudden Data Drop After a Feature Release: 14 Debugging Steps

Big Data Technology & Architecture

Jun 3, 2020 · Big Data

Designing a Unified User Behavior Data Collection System for Mobile and Web Applications

The article explains how to build a unified user‑behavior data collection platform that standardizes event definitions, front‑end reporting, and back‑end storage using Kafka pipelines and Elasticsearch, enabling comprehensive analysis of user interactions across Android, iOS, and web clients.

BackendElasticsearchKafka

0 likes · 12 min read

Designing a Unified User Behavior Data Collection System for Mobile and Web Applications

Big Data Technology Architecture

Jun 3, 2020 · Big Data

Comprehensive Kafka Interview Questions and Answers

This article compiles essential Kafka interview topics, covering cluster sizing, partition and replica configuration, offset management, topic creation, log structure, election mechanisms, partition assignment strategies, handling data backlog, exactly‑once semantics, idempotence, transactions, and performance tuning with practical command examples.

KafkaMessaginginterview

0 likes · 15 min read

Comprehensive Kafka Interview Questions and Answers

Miss Fresh Tech Team

Jun 3, 2020 · Operations

How to Build a Scalable, Low‑Cost Log Platform for Massive Data Volumes

This article details the design and implementation of a unified log platform that handles peak write rates of up to one million events per second, balances performance, stability, and cost, and leverages Filebeat, Kafka, Flink, and Elasticsearch across multi‑cloud environments.

ElasticsearchKafkaOperations

0 likes · 13 min read

How to Build a Scalable, Low‑Cost Log Platform for Massive Data Volumes

MaGe Linux Operations

May 30, 2020 · Big Data

What Is Kafka? A Deep Dive into Distributed Streaming and Messaging

Kafka is an Apache‑hosted distributed streaming platform that provides high‑throughput, durable, publish‑subscribe messaging, originally developed by LinkedIn; this article explains its core concepts, message system classifications, architecture components, APIs, replication, consumer groups, and guarantees, comparing it with other messaging solutions.

Big DataDistributed StreamingKafka

0 likes · 17 min read

What Is Kafka? A Deep Dive into Distributed Streaming and Messaging

Architects' Tech Alliance

May 27, 2020 · Fundamentals

Message Queue Overview, Models, and Comparison of ActiveMQ, RabbitMQ, RocketMQ, and Kafka

This article introduces the fundamentals of message queues, explains their characteristics, delivery models, transmission modes, and push‑pull patterns, then compares four popular implementations—ActiveMQ, RabbitMQ, RocketMQ, and Kafka—highlighting each system's strengths, weaknesses, deployment requirements, and typical use cases.

ActiveMQKafkaMessage Queue

0 likes · 17 min read

Message Queue Overview, Models, and Comparison of ActiveMQ, RabbitMQ, RocketMQ, and Kafka

MaGe Linux Operations

May 27, 2020 · Operations

Key DevOps Interview Q&A: Git, MySQL Replication, Kafka, Kubernetes

This article compiles essential DevOps interview questions covering version control differences between Git and SVN, MySQL master‑slave replication mechanics, Kafka versus traditional MQ, Kubernetes service types, pod communication, health checks, resource limits, link types, and permanent mounting techniques.

DevOpsGitKafka

0 likes · 17 min read

Key DevOps Interview Q&A: Git, MySQL Replication, Kafka, Kubernetes

Big Data Technology & Architecture

May 24, 2020 · Big Data

Analyzing and Resolving Kafka Consumer Rebalance Errors Caused by max.poll.interval.ms

The article examines a Kafka consumer rebalance error caused by exceeding max.poll.interval.ms, explains the underlying mechanics of poll intervals, offset handling, and provides practical solutions such as adjusting max.poll.interval.ms, limiting poll records, and committing offsets per message to prevent frequent rebalances.

Kafkajavamax.poll.interval.ms

0 likes · 9 min read

Analyzing and Resolving Kafka Consumer Rebalance Errors Caused by max.poll.interval.ms

Laravel Tech Community

May 22, 2020 · Big Data

Understanding Kafka Architecture: Topics, Partitions, Consumption Model, Network and Storage

This article explains Kafka's core architecture, covering how topics and partitions are stored, the advantages of its consumption model, the internal network and threading design, and the high‑reliability distributed log storage and replication mechanisms that ensure data durability and scalability.

Distributed MessagingKafkaPartitions

0 likes · 11 min read

Understanding Kafka Architecture: Topics, Partitions, Consumption Model, Network and Storage

Big Data Technology & Architecture

May 22, 2020 · Big Data

Understanding Kafka's ZooKeeper Paths and Their Stored Metadata

This article explains how ZooKeeper stores Kafka's coordination data by detailing the predefined ZK paths, the JSON structures for broker, topic, partition, controller, and consumer information, and the auxiliary nodes used for replica election and partition reassignment.

Big DataBroker metadataKafka

0 likes · 8 min read

Understanding Kafka's ZooKeeper Paths and Their Stored Metadata

macrozheng

May 21, 2020 · Big Data

Mastering Kafka: Core Concepts, Architecture, and Reliability Guarantees

This comprehensive guide covers Kafka's definition, publish/subscribe model, key components, storage mechanisms, producer and consumer strategies, and reliability features such as ACK levels, ISR, and exactly‑once semantics, providing a solid foundation for real‑time big‑data processing.

Big DataDistributed SystemsKafka

0 likes · 16 min read

Mastering Kafka: Core Concepts, Architecture, and Reliability Guarantees

Big Data Technology Architecture

May 19, 2020 · Big Data

Design and Implementation of a Unified Data Lake Platform Using HBase, Kafka, and Elasticsearch

This article summarizes the design, architecture, and key modules of a company-wide data lake platform—named “Tianchi”—built on HBase, Kafka, and Elasticsearch, detailing data ingestion, strategy output, metadata management, indexing, monitoring, and offline analysis, and shares lessons learned and future plans.

Data PlatformElasticsearchHBase

0 likes · 11 min read

Design and Implementation of a Unified Data Lake Platform Using HBase, Kafka, and Elasticsearch

Selected Java Interview Questions

May 16, 2020 · Big Data

How Reddit Counts Page Views at Scale Using HyperLogLog and Kafka

The article explains Reddit's large‑scale page‑view counting system, detailing its real‑time requirements, the challenges of naive hash‑set storage, and how a hybrid approach using linear probability and HyperLogLog algorithms together with Kafka, Redis, and Cassandra achieves accurate, low‑memory, near‑real‑time analytics.

Big DataHyperLogLogKafka

0 likes · 7 min read

How Reddit Counts Page Views at Scale Using HyperLogLog and Kafka

Top Architect

May 14, 2020 · Big Data

Kafka Overview, Architecture, Installation, and Operational Guide

This article provides a comprehensive introduction to Kafka, covering its definition, message queue concepts, architecture components, installation steps, configuration details, startup procedures, operational commands, producer and consumer mechanisms, reliability guarantees, partition strategies, offset management, and performance optimizations.

Big DataConsumerInstallation

0 likes · 22 min read

Kafka Overview, Architecture, Installation, and Operational Guide

DataFunTalk

May 11, 2020 · Big Data

Designing a Real-Time Data System with Flink: Architecture, Data Modeling, and UV Metric Computation

This article outlines a comprehensive real‑time data system built on Apache Flink, covering its application scenarios, layered architecture, data model stratification, construction methods, and a concrete Flink SQL example for calculating UV metrics from Kafka‑sourced page‑view data.

Data ArchitectureFlinkKafka

0 likes · 24 min read

Designing a Real-Time Data System with Flink: Architecture, Data Modeling, and UV Metric Computation

Architecture Digest

May 3, 2020 · Big Data

Kafka Concept Overview

This article provides a comprehensive introduction to Kafka, covering its definition, message‑queue models, architecture components, installation steps, configuration details, producer and consumer mechanisms, reliability guarantees, partition assignment strategies, offset management, and high‑performance read/write techniques.

Big DataConsumerKafka

0 likes · 20 min read

Big Data Technology Architecture

Apr 29, 2020 · Databases

Enhancing HBase CAP Model and MTTR with Kafka‑Based IO Decoupling and Native AP Support

The article analyzes HBase's CP‑oriented CAP limitations, proposes native AP support via Replica, decouples WAL IO to Kafka, optimizes MTTR, introduces multi‑datacenter active/active disaster recovery, and redesigns client write paths and LogSplit processing for higher availability and throughput.

CAPDatabase ArchitectureHBase

0 likes · 11 min read

Enhancing HBase CAP Model and MTTR with Kafka‑Based IO Decoupling and Native AP Support

Big Data Technology & Architecture

Apr 28, 2020 · Big Data

Big Data Practice Exercises: Spark, Kafka, and MySQL Integration with Scala and Java

This article presents a series of hands‑on big‑data exercises, including Spark Scala data analysis, Kafka topic creation and custom partitioning, and MySQL table design with Scala‑based streaming calculations, providing complete source code and step‑by‑step solutions for each task.

Big DataKafkaScala

0 likes · 25 min read

Big Data Practice Exercises: Spark, Kafka, and MySQL Integration with Scala and Java

Tencent Cloud Developer

Apr 28, 2020 · Big Data

Evolution of Ctrip Vacation Pricing Engine: Architecture, Challenges, and Optimizations

Ctrip’s vacation pricing engine evolved from a MySQL‑based synchronous queue to a Kafka‑driven, Spark‑parallelized architecture using HBase, dramatically cutting task generation from five hours to 1.5 hours, boosting price‑accuracy above 90 % while handling billions of calculations and external API constraints.

Distributed SystemsKafkaSpark

0 likes · 18 min read

Evolution of Ctrip Vacation Pricing Engine: Architecture, Challenges, and Optimizations

Tencent Cloud Developer

Apr 24, 2020 · Backend Development

Mask Reservation Mini‑Program: From Perfect Experience to Lossy Service – Architecture and Design

During the COVID‑19 pandemic, Tencent and Guangzhou built the “Suikang” mask‑reservation mini‑program in two days, handling 1.7 billion visits by shifting from real‑time inventory checks to a four‑layer “lossy” architecture—CDN caching, batch releases, Redis, Kafka queues, and asynchronous processing—to trade consistency for high availability and rapid response.

CAP theoremKafkaLossy Service

0 likes · 23 min read

Mask Reservation Mini‑Program: From Perfect Experience to Lossy Service – Architecture and Design

Big Data Technology Architecture

Apr 22, 2020 · Big Data

Key Kafka Producer Configuration Parameters and Tuning Recommendations

This article explains the most important Kafka producer configuration parameters—such as acks, max.request.size, retries, compression.type, buffer.memory, batch.size, linger.ms, request.timeout.ms, and max.in.fight.requests.per.connection—provides practical tuning advice, and presents a recommended setup to achieve high throughput without message loss.

ConfigurationKafkaMessage Reliability

0 likes · 9 min read

Key Kafka Producer Configuration Parameters and Tuning Recommendations

Big Data Technology & Architecture

Apr 19, 2020 · Big Data

Understanding the Backpressure Mechanism in Spark Streaming

This article explains Spark Streaming's backpressure mechanism, detailing how batch intervals can cause data accumulation, the role of Receivers versus DirectKafkaInputDStream, configuration to enable backpressure, and the internal workings of RateController, ReceiverRateController, ReceiverSupervisor, BlockGenerator, and rate calculations for Kafka streams.

Big DataKafkaRateController

0 likes · 12 min read

Understanding the Backpressure Mechanism in Spark Streaming

Big Data Technology Architecture

Apr 15, 2020 · Big Data

Real-Time Data Warehouse Practices: Case Studies from Meituan, NetEase, Zhihu, and OPPO

This article reviews the evolution of data warehouses from traditional offline models to modern real‑time architectures, presenting detailed case studies of Meituan, NetEase, Zhihu, and OPPO, and discusses layer designs, technology choices such as Flink, Kafka, and storage options, and key lessons for building scalable real‑time warehouses.

Big DataFlinkKafka

0 likes · 13 min read

Real-Time Data Warehouse Practices: Case Studies from Meituan, NetEase, Zhihu, and OPPO

Big Data Technology Architecture

Apr 13, 2020 · Backend Development

Understanding Kafka Producer: Architecture, Data Structures, Serialization, Partitioning, and Buffering

This article provides a comprehensive overview of Kafka's Producer side, covering its architecture, the ProducerRecord data structure, serialization mechanisms, partitioning logic, and the accumulator buffer, while comparing old and new Producer clients and illustrating key configurations with code examples.

AccumulatorKafkaPartitioning

0 likes · 9 min read

Understanding Kafka Producer: Architecture, Data Structures, Serialization, Partitioning, and Buffering

dbaplus Community

Apr 12, 2020 · Databases

Why and How to Migrate from MongoDB to Elasticsearch: A Practical Guide

This article explains the motivations for moving a high‑volume operation‑log system from MongoDB to Elasticsearch, outlines the existing architecture, details capacity planning, index design, and a step‑by‑step migration process using Kafka, DataX, and Spring Boot, and shares the performance gains and lessons learned.

Data MigrationDataXDatabase Architecture

0 likes · 14 min read

Why and How to Migrate from MongoDB to Elasticsearch: A Practical Guide

Tencent Cloud Middleware

Apr 9, 2020 · Operations

Scaling Kafka to Support Millions of Partitions Without Downtime

This article explains the metadata, controller, and Zookeeper challenges of supporting a million‑plus Kafka partitions and presents practical solutions such as parallel ZK fetching, metadata‑via‑topic redesign, logical cluster assembly, and physical cluster splitting to achieve large‑scale, stable Kafka deployments.

KafkaZooKeepercluster operations

0 likes · 15 min read

Scaling Kafka to Support Millions of Partitions Without Downtime

Qunar Tech Salon

Apr 9, 2020 · Backend Development

RabbitMQ vs Kafka: Key Differences and How to Choose Between Them

This article compares RabbitMQ and Apache Kafka across architecture, message ordering, routing, timing, retention, fault handling, scalability, and consumer complexity, providing guidance on when to prefer each system based on functional and non‑functional requirements.

ComparisonKafkaMessage Queue

0 likes · 17 min read

RabbitMQ vs Kafka: Key Differences and How to Choose Between Them

Programmer DD

Mar 28, 2020 · Backend Development

Why Is Kafka So Fast? Uncover the 11 Performance Secrets

Kafka achieves its remarkable speed by combining sequential I/O, batch processing, compression, zero‑copy, careful client‑side work, and a design that avoids costly fsync and garbage collection, while maintaining durability, ordering, and at‑least‑once delivery, making it a high‑throughput, low‑latency event streaming platform.

Batch ProcessingDistributed SystemsKafka

0 likes · 15 min read

Why Is Kafka So Fast? Uncover the 11 Performance Secrets

Programmer DD

Mar 26, 2020 · Backend Development

How Zhihu Built a Scalable Long‑Connection Gateway for Real‑Time Messaging

Zhihu’s infrastructure team designed a high‑performance, scalable long‑connection gateway that decouples business logic via publish‑subscribe, leverages OpenResty, Kafka, and Redis, implements fine‑grained ACL, sliding‑window flow control, and ensures message reliability and horizontal scalability for millions of concurrent devices.

KafkaMessage ReliabilityOpenResty

0 likes · 15 min read

How Zhihu Built a Scalable Long‑Connection Gateway for Real‑Time Messaging

Efficient Ops

Mar 25, 2020 · Operations

How JD Logistics Built a 300‑Million‑Metric Real‑Time Monitoring System for 99.999% Uptime

This article details JD Logistics' journey to design and implement a massive, AI‑enhanced monitoring platform that handles over three million metrics across hundreds of warehouses, addressing challenges of scale, network complexity, frequent asset changes, and integrating AIOps for proactive fault detection and resolution.

CMDBKafkaLSTM

0 likes · 23 min read

How JD Logistics Built a 300‑Million‑Metric Real‑Time Monitoring System for 99.999% Uptime

Architecture Digest

Mar 24, 2020 · Backend Development

RabbitMQ vs Kafka: Understanding Asynchronous Messaging Patterns and Choosing the Right Solution

This article explains the fundamentals of asynchronous messaging, compares the architectural differences between RabbitMQ and Apache Kafka, and provides guidance on selecting the appropriate technology based on use‑case requirements such as scalability, durability, and processing semantics.

KafkaMessage QueueRabbitMQ

0 likes · 9 min read

RabbitMQ vs Kafka: Understanding Asynchronous Messaging Patterns and Choosing the Right Solution

Qunar Tech Salon

Mar 19, 2020 · Big Data

Apache Kafka Overview: Architecture, Features, and Usage

This article provides a comprehensive introduction to Apache Kafka, covering its high‑throughput distributed architecture, core concepts such as topics, partitions, brokers, producers and consumers, design goals, performance characteristics, deployment steps, configuration, and example code for producers, consumers, and Spring Boot integration.

Big DataDistributed SystemsKafka

0 likes · 39 min read

Apache Kafka Overview: Architecture, Features, and Usage

Big Data Technology & Architecture

Mar 17, 2020 · Big Data

Quick Guide to Building a Canal‑Based Real‑Time Data Synchronization Platform on CentOS 7

This article walks through the end‑to‑end setup of a small‑scale data platform using Alibaba's Canal for MySQL binlog capture, covering the installation and configuration of MySQL, Zookeeper, Kafka, and Canal itself, and demonstrates real‑time change capture with sample DML operations.

Big DataCanalCentOS

0 likes · 20 min read

Quick Guide to Building a Canal‑Based Real‑Time Data Synchronization Platform on CentOS 7

Architecture Digest

Mar 15, 2020 · Big Data

Quick Guide to Deploying Alibaba Canal for Real‑Time MySQL Binlog Synchronization with Kafka and Zookeeper

This article provides a step‑by‑step tutorial on building a small‑scale data platform by installing MySQL, Zookeeper, Kafka and the open‑source Canal middleware, configuring Canal to capture MySQL binlog events, and forwarding the structured data to Kafka for downstream processing.

CanalKafkaZooKeeper

0 likes · 20 min read

Quick Guide to Deploying Alibaba Canal for Real‑Time MySQL Binlog Synchronization with Kafka and Zookeeper

Top Architect

Mar 13, 2020 · Big Data

Three Billion‑Scale MySQL‑to‑HBase Synchronization Solutions and Practical Implementation

This article presents a comprehensive guide for synchronizing massive MySQL datasets to HBase, covering environment preparation, fast MySQL data loading techniques, and three practical pipelines—Sqoop, Kafka‑Thrift, and Kafka‑Flink—along with performance comparisons and optimization tips for large‑scale data processing.

Big DataFlinkHBase

0 likes · 24 min read

Three Billion‑Scale MySQL‑to‑HBase Synchronization Solutions and Practical Implementation

Big Data Technology & Architecture

Mar 13, 2020 · Backend Development

Kafka Idempotent and Transactional Messaging Overview

This article explains how Kafka implements idempotent producers and transactional messaging to achieve exactly‑once semantics, detailing the producer session identifiers, sequence numbers, broker checks, two‑phase commit workflow, consumer isolation levels, and the limitations of atomic reads.

Idempotent ProducerKafkaTransactional Messaging

0 likes · 9 min read

Kafka Idempotent and Transactional Messaging Overview

ITFLY8 Architecture Home

Mar 7, 2020 · Big Data

How Kafka’s Reactor Thread Model Powers High‑Throughput Messaging

Kafka’s high‑throughput network architecture—built on NIO, a Reactor thread model, and a TCP‑based protocol—evolves from a simple synchronous processor design to a decoupled handler‑pool system, offering valuable lessons for designing scalable backend communication layers in big‑data applications.

High ThroughputKafkaReactor Model

0 likes · 7 min read

How Kafka’s Reactor Thread Model Powers High‑Throughput Messaging

Top Architect

Mar 6, 2020 · Big Data

Design and Integration of a Real-Time Log Analysis System Using Flume, Kafka, Storm, Drools, and Redis

This article details the design, installation, and modular integration of Flume, Kafka, Storm, Drools, and Redis to build a real‑time log analysis pipeline for ETL systems, discussing architecture, configuration, code examples, and practical considerations for scalability and fault tolerance.

Big DataDroolsFlume

0 likes · 24 min read

Design and Integration of a Real-Time Log Analysis System Using Flume, Kafka, Storm, Drools, and Redis

Tencent Cloud Middleware

Mar 6, 2020 · Operations

Choosing the Right Disk Strategy for High‑Throughput Kafka Clusters

This article examines how to select and configure disk solutions—single‑disk, multi‑directory, RAID, and LVM—for Apache Kafka deployments, comparing performance, cost, scalability, and reliability to help operators build stable, high‑throughput messaging infrastructures.

Big DataDisk DesignKafka

0 likes · 16 min read

Choosing the Right Disk Strategy for High‑Throughput Kafka Clusters

ITFLY8 Architecture Home

Mar 6, 2020 · Big Data

Mastering Kafka: Build Producers, Consumers, and Custom Partitioners

This extensive tutorial walks through Kafka producer and consumer fundamentals, demonstrates how to configure key‑based and custom partitioning, explains offset management, consumer group coordination, and essential configuration parameters, and includes complete Java code examples for real‑world e‑commerce scenarios.

ConsumerKafkaPartitioning

0 likes · 24 min read

Mastering Kafka: Build Producers, Consumers, and Custom Partitioners

ITFLY8 Architecture Home

Mar 5, 2020 · Big Data

How to Build a High‑Performance Kafka Production Cluster: Sizing, Config, and Best Practices

This guide explains how to design and deploy a Kafka production cluster—including capacity planning for 1 billion daily messages, hardware sizing, key configuration parameters, command‑line operations, and useful management tools—to achieve reliable high‑throughput streaming.

Cluster DeploymentKafkaperformance tuning

0 likes · 15 min read

How to Build a High‑Performance Kafka Production Cluster: Sizing, Config, and Best Practices

58 Tech

Mar 4, 2020 · Big Data

Applying Flink State Management to Real‑Time Recommendation Scenarios

This article explains how Flink's flexible state management, including Broadcast, Keyed, and Operator states, can be used to solve real‑time recommendation challenges such as per‑minute UV, click, and exposure counting, while addressing locality mapping and data‑delay issues with Druid as the downstream store.

Broadcast StateDruidFlink

0 likes · 13 min read

Applying Flink State Management to Real‑Time Recommendation Scenarios

ITFLY8 Architecture Home

Mar 4, 2020 · Big Data

Understanding Kafka: Core Concepts, Architecture, and Performance Secrets

This article introduces Kafka's role as a message system, explains its fundamental components such as topics, partitions, producers, consumers, and replicas, and dives into cluster architecture, consumer groups, Zookeeper coordination, and performance optimizations like sequential writes, zero‑copy, log segmentation, and network design.

KafkaMessage Queueperformance

0 likes · 13 min read

Understanding Kafka: Core Concepts, Architecture, and Performance Secrets

dbaplus Community

Mar 3, 2020 · Big Data

How MaFengWo Scaled Kafka for Real‑Time Big Data: Lessons and Best Practices

This article details MaFengWo's practical experience with Kafka in its big‑data platform, covering three core usage scenarios, a four‑stage evolution roadmap—including version upgrades, resource isolation, security and monitoring—and future plans such as transaction‑based deduplication and consumer throttling.

Big DataKafkaResource Isolation

0 likes · 17 min read

How MaFengWo Scaled Kafka for Real‑Time Big Data: Lessons and Best Practices

Mafengwo Technology

Feb 28, 2020 · Backend Development

How We Achieve Real‑Time MySQL‑to‑Elasticsearch Sync with Binlog and Kafka

This article explains how a large e‑commerce platform replaced a MySQL‑centric intermediate table with a binlog‑driven pipeline that streams changes through Kafka into Elasticsearch, ensuring ordered, complete, and low‑latency data synchronization while addressing schema evolution and operational monitoring.

BackendBinlogElasticsearch

0 likes · 11 min read

How We Achieve Real‑Time MySQL‑to‑Elasticsearch Sync with Binlog and Kafka

Big Data Technology & Architecture

Feb 27, 2020 · Fundamentals

Message Queue Usage, Advantages, Disadvantages, and High‑Availability Design

The article explains why message queues are used, outlines their core scenarios—decoupling, asynchronous processing, and traffic shaping—compares Kafka, ActiveMQ, RabbitMQ, and RocketMQ, and details how to ensure high availability, reliability, idempotency, ordering, and scalability in production systems.

IdempotencyKafkaMessage Queue

0 likes · 34 min read

Message Queue Usage, Advantages, Disadvantages, and High‑Availability Design

Big Data Technology Architecture

Feb 26, 2020 · Big Data

Comprehensive Guide to Kafka Architecture, Messaging Mechanisms, Replication, Controllers, and Consumer Rebalance

This article provides an in‑depth yet approachable overview of Kafka's core concepts—including its architecture, terminology, message‑sending pipeline, replication strategy, controller role, and consumer group rebalance mechanisms—helping readers quickly grasp how Kafka works as a high‑throughput distributed messaging and streaming platform.

Consumer RebalanceDistributed MessagingKafka

0 likes · 21 min read

Comprehensive Guide to Kafka Architecture, Messaging Mechanisms, Replication, Controllers, and Consumer Rebalance

Tencent Cloud Developer

Feb 18, 2020 · Backend Development

Technical Overview of Tencent Cloud CKafka for High-Scale Online Classroom Messaging

Tencent Cloud CKafka powers Tencent Classroom’s pandemic‑era online teaching by replacing a custom queue with a high‑performance, highly available, partition‑based message bus that scales to millions of real‑time interactions, offers configurable replication and tuning for reliability, and integrates with big‑data and streaming tools for analytics.

CKafkaKafkaMessage Queue

0 likes · 15 min read

Technical Overview of Tencent Cloud CKafka for High-Scale Online Classroom Messaging

Big Data Technology & Architecture

Feb 16, 2020 · Big Data

Implementing User Purchase Behavior Tracking with Flink Broadcast State

This article explains how to use Flink's Broadcast State to track user purchase paths in real time, detailing the design, required Kafka streams, Java APIs, state management, dynamic configuration, code implementation, deployment steps, and example results for a big‑data streaming application.

Big DataBroadcast StateFlink

0 likes · 19 min read

Implementing User Purchase Behavior Tracking with Flink Broadcast State

Java Captain

Feb 12, 2020 · Backend Development

Comprehensive Comparison of Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ

This article provides a detailed side‑by‑side comparison of five popular message‑queue systems—Kafka, RabbitMQ, ZeroMQ, RocketMQ and ActiveMQ—covering documentation, supported languages, protocols, storage, transactions, load balancing, clustering, management interfaces, availability, throughput, subscription models, ordering, acknowledgments, replay, retry, concurrency and more.

ActiveMQComparisonKafka

0 likes · 21 min read

Comprehensive Comparison of Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ

Big Data Technology & Architecture

Feb 11, 2020 · Backend Development

Message Queue Interview Guide: Benefits, Drawbacks, High Availability, Idempotency, Ordering and Design Strategies

This article provides a comprehensive interview‑oriented overview of message queues, explaining why they are used, their core advantages and disadvantages, comparing Kafka, RabbitMQ, ActiveMQ and RocketMQ, and detailing high‑availability, reliability, idempotency, ordering, backlog handling and architectural design considerations.

IdempotencyKafkaMessage Queue

0 likes · 33 min read

Message Queue Interview Guide: Benefits, Drawbacks, High Availability, Idempotency, Ordering and Design Strategies

Big Data Technology & Architecture

Feb 10, 2020 · Big Data

Real‑time MySQL Binlog Capture with Canal: Principles, Architecture, Deployment and Comparison with Maxwell

This article explains how to use Alibaba's Canal to capture MySQL binlog changes in real time, covering its underlying protocol, component architecture, HA design with ZooKeeper, configuration steps, deployment examples, and a detailed comparison with alternative tools such as Maxwell and mysql_streamer.

Big DataBinlogCanal

0 likes · 17 min read

Real‑time MySQL Binlog Capture with Canal: Principles, Architecture, Deployment and Comparison with Maxwell

Yanxuan Tech Team

Feb 10, 2020 · Backend Development

How Yanxuan’s Unified Message Center Scales with RocketMQ, Kafka, and K8s

This article details Yanxuan's evolution from a chaotic, multi‑queue setup to a unified, cloud‑native message center built on RocketMQ and Kafka, describing current services, scheduling mechanisms, publish‑subscribe implementation, and future plans for platformization and Kubernetes‑based resource management.

KafkaKubernetesMessaging

0 likes · 13 min read

How Yanxuan’s Unified Message Center Scales with RocketMQ, Kafka, and K8s

Big Data Technology Architecture

Feb 1, 2020 · Big Data

Apache Hudi 0.5.1 Release Highlights and Upgrade Guide

The Apache Hudi 0.5.1 release introduces upgraded Spark, Avro, Parquet and Kafka dependencies, new Scala support, timeline layout changes, CLI enhancements, DeltaStreamer parameter updates, Kafka offset enum revisions, key‑generator package relocation, Hive sync options, dynamic Bloom filter, bulk‑insert support, and AWS cloud storage compatibility.

Apache HudiDeltaStreamerKafka

0 likes · 6 min read

Apache Hudi 0.5.1 Release Highlights and Upgrade Guide

Architecture Digest

Jan 18, 2020 · Backend Development

Comprehensive Comparison of Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ Across 17 Aspects

This article provides a detailed side‑by‑side comparison of five popular message‑queue systems—Kafka, RabbitMQ, ZeroMQ, RocketMQ and ActiveMQ—covering documentation, language support, protocols, storage, transactions, load balancing, clustering, management UI, availability, duplication handling, throughput, subscription models, ordering, acknowledgments, replay, retry mechanisms and concurrency.

KafkaMessage QueueRabbitMQ

0 likes · 25 min read

Comprehensive Comparison of Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ Across 17 Aspects

Big Data Technology & Architecture

Jan 16, 2020 · Big Data

Kafka Interview Guide: Core Concepts, Architecture, and Practical Tips

This article compiles essential Kafka interview material, covering its role as a message queue, usage scenarios, architectural components, storage mechanisms, consumer group rebalancing, high‑availability features, replication details, ordering guarantees, producer/consumer client design, topic management, log retention, performance optimizations, and key monitoring metrics.

Big DataDistributed SystemsKafka

0 likes · 16 min read

Kafka Interview Guide: Core Concepts, Architecture, and Practical Tips

360 Tech Engineering

Jan 16, 2020 · Big Data

Real-Time and Offline Integrated Solution for Channel Analysis Data Processing

This article presents a comprehensive real‑time and offline integrated solution for a channel analysis system, detailing challenges, architecture, implementation using Flink, Spark Streaming, Kafka, Elasticsearch, and HIVE, and demonstrating minute‑level latency and high accuracy through performance evaluations.

Big DataElasticsearchFlink

0 likes · 10 min read

Real-Time and Offline Integrated Solution for Channel Analysis Data Processing

Big Data Technology & Architecture

Jan 13, 2020 · Big Data

130 Essential Big Data and Distributed Systems Interview Questions

This article compiles 130 interview questions spanning big data technologies, distributed systems, and core computer science concepts to help candidates prepare for technical interviews, offering a comprehensive resource for self‑study and review.

FlinkHadoopKafka

0 likes · 12 min read

130 Essential Big Data and Distributed Systems Interview Questions

Programmer DD

Jan 13, 2020 · Backend Development

How Kafka’s Broker Controller Keeps Your Data Flowing – Inside the Replication Engine

This article dives deep into Kafka’s internal mechanics, explaining how brokers replicate data, how the controller coordinates the cluster via ZooKeeper, the roles of leader and follower replicas, ISR management, request handling, fail‑over strategies, and consumer group rebalancing, all illustrated with diagrams.

BackendBroker ControllerISR

0 likes · 36 min read

How Kafka’s Broker Controller Keeps Your Data Flowing – Inside the Replication Engine

Architect's Tech Stack

Jan 12, 2020 · Backend Development

Comprehensive Guide to Spring‑Kafka Integration and Advanced Features

This article provides a systematic tutorial on using Spring‑Kafka, covering basic setup, embedded Kafka for testing, topic creation methods, message sending with KafkaTemplate, transactional messaging, request‑reply patterns, advanced @KafkaListener configurations, manual acknowledgment, listener lifecycle control, SendTo forwarding, and retry with dead‑letter queues, all illustrated with complete code examples.

EmbeddedKafkaKafkaMessaging

0 likes · 19 min read

Comprehensive Guide to Spring‑Kafka Integration and Advanced Features

ITPUB

Jan 10, 2020 · Big Data

How MaFengWo Scales Kafka for Real‑Time Big Data: Lessons and Best Practices

This article details MaFengWo’s practical experience using Kafka across three core scenarios—real‑time storage, analytical data source, and business data subscription—while describing a four‑stage evolution that includes version upgrades, resource isolation, security and monitoring enhancements, and a comprehensive subscription platform, followed by future improvement plans.

Big DataData ReplayKafka

0 likes · 16 min read

How MaFengWo Scales Kafka for Real‑Time Big Data: Lessons and Best Practices

Big Data Technology & Architecture

Jan 8, 2020 · Big Data

Real-Time Data Warehouse Architecture and Challenges Using Flink, Kafka, and HBase

This article examines the design of a real-time data warehouse built on Flink, Kafka, and HBase, compares it with traditional offline warehouses, and discusses key challenges such as data accuracy, latency, and the complexity of maintaining real-time dimension tables.

Big DataFlinkHBase

0 likes · 10 min read

Real-Time Data Warehouse Architecture and Challenges Using Flink, Kafka, and HBase

Big Data Technology & Architecture

Jan 7, 2020 · Big Data

Real-time Data Processing with Kafka, Spark Streaming, and HBase: Implementation Guide

This article presents a step‑by‑step guide for building a real‑time data pipeline using Kafka as a message buffer, Spark‑Streaming's Direct Approach for processing, and HBase for storage, including code examples, Maven configuration, local cluster setup, and troubleshooting tips.

Big DataHBaseKafka

0 likes · 12 min read

Real-time Data Processing with Kafka, Spark Streaming, and HBase: Implementation Guide

Java High-Performance Architecture

Jan 7, 2020 · Backend Development

How to Build a Scalable Reporting Service in a Microservice Architecture

To generate a user‑enriched order report in a microservice system, the article compares four approaches—direct DB access, REST data aggregation, batch pulling, and an event‑driven model—highlighting their trade‑offs in coupling, performance, scalability, and resilience, and recommends the event‑push solution.

Data IntegrationEvent-drivenKafka

0 likes · 5 min read

How to Build a Scalable Reporting Service in a Microservice Architecture