Tagged articles
1273 articles
Page 11 of 13
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 21, 2019 · Big Data

Kafka Offset Management and Replication Mechanisms Explained

This article provides a comprehensive technical overview of Kafka's offset handling, covering the request entry point, in‑memory offset sources, offset commit and fetch implementations, file storage layout, and the leader‑follower synchronization process that ensures data replication and high‑watermark updates.

Big DataDistributed SystemsHigh Watermark
0 likes · 16 min read
Kafka Offset Management and Replication Mechanisms Explained
Java High-Performance Architecture
Java High-Performance Architecture
Dec 17, 2019 · Backend Development

Understanding Kafka Topic Architecture: Partitions, Replication, and Failover

This article explains Kafka's topic architecture, detailing how topics are split into partitions for scalability and parallelism, the role of logs, key-based and round-robin partitioning, replication with leaders, followers, ISR, and how these mechanisms enable fault‑tolerance and high‑performance consumer failover.

BackendKafkaPartition
0 likes · 7 min read
Understanding Kafka Topic Architecture: Partitions, Replication, and Failover
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 9, 2019 · Big Data

Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees

This article explains how to develop a real‑time ETL application using Apache Flink that reads events from Kafka, partitions them by event time into HDFS directories, and achieves exactly‑once processing through checkpointing, custom bucket assigners, and proper state backend configuration.

Apache FlinkBig DataExactly-Once
0 likes · 11 min read
Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees
Java High-Performance Architecture
Java High-Performance Architecture
Dec 7, 2019 · Backend Development

How Zookeeper Powers Kafka: Key Roles Explained

This article explains how Zookeeper functions as an essential part of Kafka by managing broker status, controller election, quotas, ISR tracking, node and topic registration, as well as consumer offset storage and registration, providing a comprehensive overview for interview preparation.

BrokerConsumerKafka
0 likes · 4 min read
How Zookeeper Powers Kafka: Key Roles Explained
Architecture Digest
Architecture Digest
Nov 25, 2019 · Big Data

Introduction to Apache Kafka: Core Concepts, Architecture, and APIs

This article provides a comprehensive overview of Apache Kafka, covering its fundamental capabilities, typical use cases, core components, key APIs, and essential concepts such as topics, partitions, segments, brokers, producers, and consumers, illustrated with diagrams.

APIsBig DataDistributed Systems
0 likes · 8 min read
Introduction to Apache Kafka: Core Concepts, Architecture, and APIs
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 24, 2019 · Big Data

Common Apache Kafka Exceptions and Their Causes

This article lists frequent Apache Kafka exceptions such as UnknownTopicOrPartitionException, LEADER_NOT_AVAILABLE, NotLeaderForPartitionException, TimeoutException, RecordTooLargeException, and others, explaining each error message, typical reasons, and practical troubleshooting steps for producers and consumers.

Big DataConsumerError Handling
0 likes · 5 min read
Common Apache Kafka Exceptions and Their Causes
Architect's Tech Stack
Architect's Tech Stack
Nov 21, 2019 · Backend Development

Comprehensive Guide to Spring‑Kafka: Integration, Configuration, and Advanced Features

This article provides a thorough tutorial on using Spring‑Kafka, covering basic Maven setup, producer and consumer code, embedded Kafka for testing, programmatic topic creation, synchronous and asynchronous message sending, transaction handling, request‑reply patterns, advanced @KafkaListener options, manual acknowledgments, lifecycle control, message forwarding with @SendTo, and retry with dead‑letter queues.

KafkaMessagingMicroservices
0 likes · 16 min read
Comprehensive Guide to Spring‑Kafka: Integration, Configuration, and Advanced Features
G7 EasyFlow Tech Circle
G7 EasyFlow Tech Circle
Nov 21, 2019 · Big Data

How G7 Combines AI, Big Data, and IoT to Transform Logistics

This article presents a detailed overview of G7's AI‑plus‑Big‑Data‑plus‑IoT platform for logistics, describing its neutral open architecture, real‑time data pipelines using Kafka and Flink, Lambda‑style storage in HBase/Hive, and the resulting safety‑insurance and analytics capabilities.

AIFlinkIoT
0 likes · 10 min read
How G7 Combines AI, Big Data, and IoT to Transform Logistics
Java Captain
Java Captain
Nov 20, 2019 · Backend Development

Exploring Advanced Features of Spring‑Kafka: Integration, Embedded Server, Topic Management, Transactions, and Message Handling

This article provides a comprehensive guide to using Spring‑Kafka, covering simple integration, embedded Kafka for testing, creating topics programmatically, advanced KafkaTemplate usage, transactional messaging, ReplyingKafkaTemplate for request‑reply, listener configurations, manual acknowledgments, lifecycle control, message forwarding, and retry with dead‑letter queues.

Embedded KafkaKafkaMessage Queue
0 likes · 19 min read
Exploring Advanced Features of Spring‑Kafka: Integration, Embedded Server, Topic Management, Transactions, and Message Handling
Programmer DD
Programmer DD
Nov 19, 2019 · Backend Development

17‑Point Comparison: Kafka vs RabbitMQ vs ZeroMQ vs RocketMQ vs ActiveMQ

This article provides a comprehensive 17‑point comparison of five popular message queue systems—Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ—covering documentation, supported languages, protocols, storage, transactions, load balancing, clustering, management UI, availability, message duplication, throughput, subscription models, ordering, acknowledgments, replay, retries, concurrency, and more.

ActiveMQBackendKafka
0 likes · 23 min read
17‑Point Comparison: Kafka vs RabbitMQ vs ZeroMQ vs RocketMQ vs ActiveMQ
Qunar Tech Salon
Qunar Tech Salon
Nov 18, 2019 · Databases

Data Synchronization Architecture and Refactoring for Large-Scale Travel Data at Qunar

This article describes the challenges of handling billions of travel records in Qunar's MySQL databases, compares open‑source data sync solutions like Databus and Canal, outlines the legacy system’s issues, and presents a refactored architecture that introduces Otter, ES gateway, and improved aggregation to achieve low‑latency, reliable, and scalable data synchronization.

ETLElasticsearchKafka
0 likes · 19 min read
Data Synchronization Architecture and Refactoring for Large-Scale Travel Data at Qunar
Java High-Performance Architecture
Java High-Performance Architecture
Nov 12, 2019 · Backend Development

How Kafka Consumer Groups Boost Performance and Fault Tolerance

Kafka consumer groups enable multiple consumers to share partition workloads, ensuring exclusive consumption within a group, flexible consumption patterns like broadcast and unicast, and automatic fault‑tolerance through rebalancing, ultimately improving throughput, scalability, and resilience of streaming applications.

Kafkabackend-developmentconsumer groups
0 likes · 4 min read
How Kafka Consumer Groups Boost Performance and Fault Tolerance
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 7, 2019 · Big Data

Real‑time Dashboard with Flink: Streaming Order Data, Site Metrics, and Top‑N Merchandise Rankings

This article demonstrates how to build a one‑second‑refresh real‑time dashboard for e‑commerce order data using Apache Flink, Kafka, and Redis, covering JSON message parsing, processing‑time windows, stateful aggregation for site‑level KPIs, and efficient top‑N product ranking via Redis sorted sets.

DashboardFlinkKafka
0 likes · 11 min read
Real‑time Dashboard with Flink: Streaming Order Data, Site Metrics, and Top‑N Merchandise Rankings
DataFunTalk
DataFunTalk
Nov 7, 2019 · Big Data

Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans

This article details Beike's real‑time computing engine, covering its background, streaming platform built on Spark Streaming and Flink, data ingestion via Kafka, metadata handling, SQL‑based task development, monitoring, storage solutions, and future roadmap for resource management and AI‑enhanced monitoring.

Big DataFlinkKafka
0 likes · 14 min read
Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans
Java Backend Technology
Java Backend Technology
Nov 6, 2019 · Backend Development

Kafka vs RabbitMQ vs ZeroMQ vs RocketMQ vs ActiveMQ: Comprehensive Feature Comparison

This article systematically compares Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ across documentation availability, supported programming languages, protocols, storage models, transaction capabilities, load‑balancing mechanisms, clustering approaches, management interfaces, availability, duplicate handling, throughput, subscription patterns, ordering guarantees, acknowledgment strategies, replay options, retry mechanisms, and concurrency limits, helping engineers choose the right message queue for their needs.

ActiveMQKafkaMessage Queue Comparison
0 likes · 24 min read
Kafka vs RabbitMQ vs ZeroMQ vs RocketMQ vs ActiveMQ: Comprehensive Feature Comparison
Architecture Digest
Architecture Digest
Nov 5, 2019 · Big Data

Architecture Overview of Taobao, Meituan, and Didi Big Data Platforms

This article examines the big‑data architectures of three leading Chinese internet companies—Taobao, Meituan, and Didi—detailing their data sources, synchronization mechanisms, batch and streaming processing layers, and the common scheduling components that unify their Hadoop‑based ecosystems.

Big DataData ArchitectureDidi
0 likes · 7 min read
Architecture Overview of Taobao, Meituan, and Didi Big Data Platforms
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 30, 2019 · Big Data

Building a Real‑Time Data Processing Pipeline with Apache Kafka, Spark Streaming, and Cassandra

This tutorial explains how to create a highly scalable, fault‑tolerant real‑time data processing platform by configuring a Kafka topic, a Cassandra keyspace, adding Spark and connector dependencies, developing a Java‑based Spark Streaming pipeline, enabling checkpoints, and deploying the application with spark‑submit.

Big DataKafkaReal-Time
0 likes · 8 min read
Building a Real‑Time Data Processing Pipeline with Apache Kafka, Spark Streaming, and Cassandra
DataFunTalk
DataFunTalk
Oct 25, 2019 · Big Data

Migrating Data from HBase to Kafka Using MapReduce

This article explains how to reverse the typical data flow by extracting massive Rowkeys from HBase with MapReduce, storing them on HDFS, and then using batch Get operations to retrieve the full records and write them into Kafka, while handling retries and monitoring progress.

Big DataData MigrationHBase
0 likes · 9 min read
Migrating Data from HBase to Kafka Using MapReduce
Architects Research Society
Architects Research Society
Oct 22, 2019 · Big Data

Continuous Delivery of Event Streaming Pipelines with Spring Cloud Data Flow

This article explains how to build, deploy, and continuously update event streaming pipelines using Spring Cloud Data Flow and Apache Kafka, covering common topologies, named destinations, parallel and partitioned streams, function composition, multiple input/output bindings, and practical shell commands for registration and management.

Continuous DeploymentEvent StreamingKafka
0 likes · 19 min read
Continuous Delivery of Event Streaming Pipelines with Spring Cloud Data Flow
dbaplus Community
dbaplus Community
Oct 20, 2019 · Big Data

Mastering Kafka: Concepts, Installation, Optimization, and Security

This comprehensive guide covers Kafka's core concepts, design principles, installation steps, configuration tweaks, performance optimizations, permission management, common operational commands, cluster scaling, log retention settings, and monitoring scripts to help you build and maintain a robust Kafka ecosystem.

Big DataConfigurationInstallation
0 likes · 20 min read
Mastering Kafka: Concepts, Installation, Optimization, and Security
JD Retail Technology
JD Retail Technology
Oct 14, 2019 · Databases

Overview of JDNoSQL Platform and Its Real-Time Advertising Use Cases

The article introduces JDNoSQL, a distributed column‑oriented key‑value store built on HDFS, outlines its core features, describes various business scenarios including real‑time ad computation, details the system architecture with Kafka and Flink, and presents table designs for ad impression and click statistics.

Big DataFlinkKafka
0 likes · 13 min read
Overview of JDNoSQL Platform and Its Real-Time Advertising Use Cases
UCloud Tech
UCloud Tech
Oct 11, 2019 · Big Data

Real‑Time Student Performance Analytics with Flink and Spark

This article demonstrates how to build a real‑time education analytics system by streaming answer data through Kafka into Flink or Spark, performing per‑question, per‑grade, and per‑subject aggregations, and optionally accelerating development with UFlink SQL.

Education AnalyticsFlinkKafka
0 likes · 17 min read
Real‑Time Student Performance Analytics with Flink and Spark
dbaplus Community
dbaplus Community
Oct 8, 2019 · Big Data

How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases

This article shares a senior data‑platform engineer's hands‑on experience managing dozens of thousand‑node clusters, detailing nine common cluster problems and step‑by‑step solutions—including performance tuning, RPC fixes, HDFS cleanup, Hive metadata repair, Spark shuffle optimization, HBase region recovery, and Kafka bottleneck mitigation.

Big DataCluster ManagementHBase
0 likes · 17 min read
How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 23, 2019 · Backend Development

Design and Evolution of Feed Stream Architecture for High‑Throughput Applications

This article analyzes the business requirements, technical challenges, and mainstream architectural solutions for large‑scale feed streams, and proposes a step‑by‑step evolution path—from a simple push model using cloud Kafka and HBase to hybrid push‑pull and recommendation‑driven designs—suitable for startups and rapidly growing platforms.

BackendHBaseKafka
0 likes · 15 min read
Design and Evolution of Feed Stream Architecture for High‑Throughput Applications
Sohu Tech Products
Sohu Tech Products
Sep 11, 2019 · Backend Development

Design and Implementation of a High‑Concurrency Flash‑Sale System for Online Real‑Estate Opening

The article explains how to handle massive simultaneous user requests in a flash‑sale scenario by using rate limiting, caching, asynchronous processing, distributed locks, load balancing, and anti‑cheat mechanisms, illustrated with the Sohu Focus online opening system architecture.

Backend ArchitectureDistributed SystemsKafka
0 likes · 12 min read
Design and Implementation of a High‑Concurrency Flash‑Sale System for Online Real‑Estate Opening
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 6, 2019 · Big Data

Big Data Development Interview Guide and Skill Tree Overview

This article provides a comprehensive interview roadmap for big data developers, outlining essential Java fundamentals, JVM internals, Linux basics, distributed theory, core frameworks such as Hadoop, Spark, Flink, Kafka, Netty, HBase, Hive, and practical algorithm topics, while also offering resume and career advice for aspiring candidates.

FlinkHadoopKafka
0 likes · 15 min read
Big Data Development Interview Guide and Skill Tree Overview
DataFunTalk
DataFunTalk
Sep 5, 2019 · Big Data

Apache Beam Architecture Principles and Practical Application

This article introduces Apache Beam as a unified programming model for batch and streaming data processing, explains its architecture, core components, advantages, extensibility, and demonstrates practical usage with KafkaIO, BeamSQL, and AIoT scenarios across multiple runners.

Apache BeamKafkaStreaming
0 likes · 16 min read
Apache Beam Architecture Principles and Practical Application
dbaplus Community
dbaplus Community
Sep 4, 2019 · Operations

Running Kafka on Kubernetes: Practical Tips, Pitfalls, and Best Practices

This guide explains how to run Kafka on Kubernetes, covering runtime resource needs, storage considerations, network requirements, configuration with Pods, StatefulSets, Helm charts and Operators, performance testing, monitoring, logging, health checks, rolling updates, scaling, and backup strategies.

KafkaKubernetesOps
0 likes · 12 min read
Running Kafka on Kubernetes: Practical Tips, Pitfalls, and Best Practices
dbaplus Community
dbaplus Community
Sep 3, 2019 · Backend Development

How We Built Real-Time MySQL-to-Elasticsearch Sync with Binlog and Kafka

To meet growing e‑commerce search demands, the team replaced a MySQL‑based intermediate table with a real‑time binlog‑driven pipeline that streams changes through Kafka into Elasticsearch, detailing design choices, ordering and completeness guarantees, custom modules, and monitoring for sub‑second sync latency.

BinlogElasticsearchKafka
0 likes · 13 min read
How We Built Real-Time MySQL-to-Elasticsearch Sync with Binlog and Kafka
Architecture Digest
Architecture Digest
Aug 14, 2019 · Big Data

Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model

Kafka is a distributed, partitioned, replicated messaging system originally developed by LinkedIn, offering high throughput, low latency, fault tolerance, and scalability; this article explains its core concepts, file storage design, partition replication, leader election, consumer groups, delivery guarantees, and operational considerations for big‑data pipelines.

Big DataDistributed SystemsKafka
0 likes · 56 min read
Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model
360 Quality & Efficiency
360 Quality & Efficiency
Aug 8, 2019 · Big Data

An Introduction to Kafka: Architecture, Design Principles, and Common Issues

This article introduces Kafka, covering its definition, core concepts such as topics, partitions, offsets, producers and consumers, typical use cases, underlying design principles including message‑partition allocation and retention policies, processing mechanisms, and common troubleshooting questions for real‑world deployments.

Big DataDistributed MessagingKafka
0 likes · 7 min read
An Introduction to Kafka: Architecture, Design Principles, and Common Issues
Architecture Digest
Architecture Digest
Aug 8, 2019 · Big Data

Kafka Practical Guide: Concepts, Architecture, Configuration, Monitoring, and Management

This article provides a comprehensive overview of Kafka, covering its basic concepts, architecture, deployment, configuration, monitoring, producer and consumer settings, offset management, high availability, replication, leader election, and practical tips for deployment, tuning, and troubleshooting in production environments.

Distributed SystemsKafkaMessage Queue
0 likes · 37 min read
Kafka Practical Guide: Concepts, Architecture, Configuration, Monitoring, and Management
vivo Internet Technology
vivo Internet Technology
Aug 7, 2019 · Big Data

Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management

The article gives a thorough overview of Apache Kafka, explaining its core concepts, architecture, deployment steps, monitoring tools, and offset management, including broker and topic structures, producer/consumer APIs, replication, leader election, consumer groups, offset committing, and practical configuration and troubleshooting guidance.

Big DataKafkaMessaging
0 likes · 36 min read
Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management
Big Data Technology Architecture
Big Data Technology Architecture
Aug 5, 2019 · Big Data

Zookeeper in Distributed Systems: Roles in Kafka, Hadoop, HBase, and Solr

This article explains Zookeeper’s core concepts, its ZAB consensus protocol, and surveys its essential roles in major big‑data components such as Kafka, Hadoop, HBase, and Solr, illustrating how it provides configuration, naming, coordination, leader election, and high‑availability services across distributed architectures.

Distributed SystemsHBaseHadoop
0 likes · 5 min read
Zookeeper in Distributed Systems: Roles in Kafka, Hadoop, HBase, and Solr
Big Data Technology Architecture
Big Data Technology Architecture
Aug 3, 2019 · Big Data

Kafka Architecture: Design Principles for High Throughput and Reliability

This article explains Kafka's design background, persistence mechanisms, disk sequential I/O optimizations, network and compression strategies, and stability features such as partitioning, replication, and ISR, illustrating how these techniques enable high‑throughput, low‑latency real‑time log processing in big‑data environments.

Disk I/OHigh ThroughputKafka
0 likes · 9 min read
Kafka Architecture: Design Principles for High Throughput and Reliability
NetEase Media Technology Team
NetEase Media Technology Team
Aug 2, 2019 · Backend Development

Delayed Message Queue Implementation: Use Cases, Comparison, and NetEase Open Course Practice

The article explains how delayed message queues replace inefficient scheduled‑task scans in distributed systems, outlines common use cases such as order timeouts and retries, compares RabbitMQ, RocketMQ, Kafka, ActiveMQ and Redis implementations, and details NetEase’s ActiveMQ‑based solution with idempotent processing and traceability.

ActiveMQDistributed SystemsKafka
0 likes · 13 min read
Delayed Message Queue Implementation: Use Cases, Comparison, and NetEase Open Course Practice
dbaplus Community
dbaplus Community
Jul 30, 2019 · Big Data

Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?

With the surge in real‑time data from sensors and devices, choosing the right streaming engine is critical; this article compares Apache Spark and Apache Flink—examining their architectures, micro‑batch vs continuous processing, strengths, limitations, and use‑case suitability for Kafka‑driven pipelines.

Big DataFlinkKafka
0 likes · 14 min read
Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 28, 2019 · Operations

Comprehensive Guide to Building an ELK Log Management Platform with Kafka and Filebeat

This article provides a detailed tutorial on designing, deploying, and operating an ELK log management platform—including Elasticsearch, Logstash, Kibana, Kafka, and Filebeat—covering architecture options, configuration files, command‑line operations, cluster setup, and best‑practice recommendations for scalable, real‑time log collection and analysis.

ELKElasticsearchFilebeat
0 likes · 22 min read
Comprehensive Guide to Building an ELK Log Management Platform with Kafka and Filebeat
ITPUB
ITPUB
Jul 26, 2019 · Backend Development

How to Gracefully Handle Kill Signals in Java Backend Processes

On Linux, this guide shows how to locate and terminate a Java background process using ps and kill, then explains why force‑killing can leave resources open, and provides a complete Java SignalHandler implementation to receive termination signals, safely close files, database connections, and Kafka consumers.

KafkaKill SignalLinux
0 likes · 8 min read
How to Gracefully Handle Kill Signals in Java Backend Processes
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 15, 2019 · Backend Development

Why Use Message Queues? Benefits, Drawbacks, and Comparison of Kafka, ActiveMQ, RabbitMQ, and RocketMQ

The article explains why message queues are employed for decoupling, asynchronous processing, and load‑shedding, outlines their advantages and disadvantages, and compares popular MQ products such as Kafka, ActiveMQ, RabbitMQ, and RocketMQ to guide technology selection.

AsynchronousBackend ArchitectureDecoupling
0 likes · 5 min read
Why Use Message Queues? Benefits, Drawbacks, and Comparison of Kafka, ActiveMQ, RabbitMQ, and RocketMQ
Big Data Technology Architecture
Big Data Technology Architecture
Jul 12, 2019 · Big Data

Why Kafka Is So Popular: Features, Use Cases, and Architecture Overview

This article explains why Apache Kafka has become a cornerstone of modern big‑data pipelines by detailing its high‑throughput, fault‑tolerant publish‑subscribe architecture, real‑time processing capabilities, extensive language support, scalability mechanisms, and the wide range of use cases adopted by leading enterprises.

Distributed StreamingKafkaMessage Queue
0 likes · 9 min read
Why Kafka Is So Popular: Features, Use Cases, and Architecture Overview
Mafengwo Technology
Mafengwo Technology
Jul 11, 2019 · Backend Development

How We Achieved Near‑Real‑Time MySQL‑to‑Elasticsearch Sync Using Binlog and Kafka

This article explains why traditional MySQL queries no longer meet the growing e‑commerce data needs, describes the limitations of a MySQL‑to‑Elasticsearch intermediate table, and details a binlog‑driven, Kafka‑based pipeline with custom modules, upsert handling, filtering, and monitoring to ensure fast, reliable data synchronization.

BackendBinlogElasticsearch
0 likes · 11 min read
How We Achieved Near‑Real‑Time MySQL‑to‑Elasticsearch Sync Using Binlog and Kafka
Big Data Technology Architecture
Big Data Technology Architecture
Jul 9, 2019 · Big Data

Kafka Best Practices for High Throughput: 20 Recommendations

This article presents New Relic's 20 best‑practice recommendations for operating Apache Kafka at high throughput, covering partitions, consumers, producers, and brokers, and explains key concepts, configuration tuning, monitoring, and architectural considerations to ensure reliable, scalable streaming pipelines.

BrokersConsumersHigh Throughput
0 likes · 14 min read
Kafka Best Practices for High Throughput: 20 Recommendations
dbaplus Community
dbaplus Community
Jul 4, 2019 · Databases

How Tencent’s TDSQL Multi‑Source Sync Achieves High‑Performance, Consistent Data Distribution

This article explains the financial‑industry driven requirements for real‑time data sync, describes the TDSQL‑MULTISRCSYNC architecture—including producer, store, and consumer components—and details core designs such as row‑hash concurrency, idempotent binlog handling, and a lock‑based ordering mechanism that ensure high throughput and consistency.

Database ReplicationIdempotencyKafka
0 likes · 13 min read
How Tencent’s TDSQL Multi‑Source Sync Achieves High‑Performance, Consistent Data Distribution
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 1, 2019 · Big Data

How to Ensure High Availability of Message Queues (RabbitMQ and Kafka)

This article explains the concept of high availability for message queues, analyzes interview expectations, and details the HA mechanisms of RabbitMQ (including single, normal cluster, and mirrored modes) and Kafka (partition replication and leader election), highlighting their advantages, drawbacks, and practical considerations.

Distributed SystemsKafkaMessage Queue
0 likes · 11 min read
How to Ensure High Availability of Message Queues (RabbitMQ and Kafka)
Beike Product & Technology
Beike Product & Technology
Jun 28, 2019 · Backend Development

EPX: Real-Time MySQL Change Capture and Kafka Sync Architecture

EPX is a high‑availability, high‑performance data pipeline that captures MySQL binlog changes in real time, parses and filters them, and streams unified JSON events to Kafka for downstream services, while providing monitoring, alerting, backup, and migration capabilities across many business units.

BackendKafkahigh availability
0 likes · 7 min read
EPX: Real-Time MySQL Change Capture and Kafka Sync Architecture
Java Captain
Java Captain
Jun 15, 2019 · Backend Development

Why Message Queues Are Needed and an Introduction to Kafka

This article explains the motivations behind using message‑queue middleware, outlines its benefits such as decoupling, asynchronous processing and peak‑shaving, describes point‑to‑point and publish‑subscribe communication models, and provides a detailed overview of Kafka’s architecture, terminology, data flow, storage strategy, and consumer group mechanics.

BackendKafkaPublish-Subscribe
0 likes · 15 min read
Why Message Queues Are Needed and an Introduction to Kafka
dbaplus Community
dbaplus Community
Jun 13, 2019 · Backend Development

Kafka vs RabbitMQ vs ZeroMQ vs RocketMQ vs ActiveMQ: 17‑Point Comparison

This article provides a comprehensive 17‑point comparison of five popular message‑queue systems—Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ—covering documentation, language support, protocols, storage, transactions, load balancing, clustering, management UI, availability, duplication guarantees, throughput, subscription models, ordering, acknowledgements, replay, retry mechanisms, and concurrency characteristics.

ActiveMQKafkaRabbitMQ
0 likes · 26 min read
Kafka vs RabbitMQ vs ZeroMQ vs RocketMQ vs ActiveMQ: 17‑Point Comparison
Architecture Digest
Architecture Digest
Jun 10, 2019 · Backend Development

Design and Implementation of Zhihu's Long Connection Gateway

This article explains how Zhihu designed a scalable long‑connection gateway that decouples business logic via a publish‑subscribe model, implements ACL‑based authorization, ensures message reliability with acknowledgments and sliding windows, and leverages OpenResty, Redis, and Kafka for load‑balanced, fault‑tolerant backend services.

KafkaOpenRestyScalability
0 likes · 13 min read
Design and Implementation of Zhihu's Long Connection Gateway
DataFunTalk
DataFunTalk
Jun 3, 2019 · Big Data

Choosing a Real-Time Computing Engine Based on Kafka: Spark vs Flink

This article examines the need for real‑time computation, explains streaming versus real‑time concepts, and compares Apache Spark and Apache Flink—covering their architectures, micro‑batch and continuous processing, advantages, limitations, windowing, event‑time handling, and watermarks—to guide engine selection for Kafka‑driven workloads.

FlinkKafkaSpark
0 likes · 15 min read
Choosing a Real-Time Computing Engine Based on Kafka: Spark vs Flink
Architect's Tech Stack
Architect's Tech Stack
May 31, 2019 · Big Data

Kafka Architecture Overview: Producers, Consumers, Partitions, Replication, and Transactions

This article provides a comprehensive overview of Apache Kafka's architecture, covering topics such as producer and consumer workflows, partition and replica management, leader election, offset handling, message delivery semantics, transaction support, and file organization, illustrating how Kafka achieves high performance and scalability.

ConsumerDistributed SystemsKafka
0 likes · 18 min read
Kafka Architecture Overview: Producers, Consumers, Partitions, Replication, and Transactions
ITPUB
ITPUB
May 29, 2019 · Big Data

How to Build a Trillion-Scale Real-Time Data Platform: Lessons from DTCC 2019

In a DTCC 2019 keynote, Zhao Qun, director of big‑data platform at Percent Point, outlines the challenges of trillion‑scale real‑time analytics and presents a transparent, fine‑grained architecture built on Kafka, Spark Streaming, ClickHouse, HBase, Ceph and Elasticsearch, detailing design principles, component sizing, multi‑center deployment, performance testing and operational safeguards.

Big DataKafkaReal-time analytics
0 likes · 17 min read
How to Build a Trillion-Scale Real-Time Data Platform: Lessons from DTCC 2019
Architecture Digest
Architecture Digest
May 23, 2019 · Backend Development

Comprehensive Comparison of Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ

This article provides a detailed side‑by‑side analysis of five popular message‑queue systems—Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ—covering documentation, programming languages, protocols, storage, transactions, load balancing, clustering, management interfaces, availability, duplication handling, throughput, subscription models, ordering, acknowledgements, backtracking, retry mechanisms, and concurrency characteristics.

ActiveMQKafkaMessage Queue Comparison
0 likes · 21 min read
Comprehensive Comparison of Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ
Big Data Technology Architecture
Big Data Technology Architecture
May 18, 2019 · Big Data

Key Concepts of Kafka, Hadoop Shuffle, Spark Cluster Modes, HDFS I/O, and Spark RDD Operations

This article explains Kafka message structure and offset retrieval, details Hadoop's map and reduce shuffle processes, outlines Spark's deployment modes, describes HDFS read/write mechanisms, compares reduceByKey and groupByKey performance, and discusses Spark streaming integration with Kafka and data loss prevention.

HDFSHadoopKafka
0 likes · 10 min read
Key Concepts of Kafka, Hadoop Shuffle, Spark Cluster Modes, HDFS I/O, and Spark RDD Operations
Big Data Technology & Architecture
Big Data Technology & Architecture
May 17, 2019 · Backend Development

Understanding Kafka Producer Idempotence: PID, Sequence Numbers, and Implementation Details

This article explains how Apache Kafka implements producer idempotence by introducing Producer IDs (PID) and sequence numbers, describes the request‑response flow for PID allocation, details server‑side PID management, shows the exact‑once guarantee mechanism, and answers common configuration questions with code examples.

BackendIdempotenceKafka
0 likes · 32 min read
Understanding Kafka Producer Idempotence: PID, Sequence Numbers, and Implementation Details
NetEase Media Technology Team
NetEase Media Technology Team
May 16, 2019 · Backend Development

Design and Implementation of a Configurable, Extensible Content Processing System (Apollo)

Apollo is a configurable, extensible content‑processing platform that models each step as a node defined in a configuration file, supports multiple implementations for A/B testing, decouples producers and consumers via Kafka, ensures fault‑tolerant retries and replay, captures fine‑grained metrics through Canal‑to‑TiDB pipelines, and cuts new‑type development effort to roughly ten percent of the original cost while delivering high‑quality data to downstream teams.

Backend ArchitectureKafkaTiDB
0 likes · 9 min read
Design and Implementation of a Configurable, Extensible Content Processing System (Apollo)
Architecture Digest
Architecture Digest
May 5, 2019 · Big Data

Kafka Architecture Overview: Topics, Partitions, Producers, Consumers, Replication, Leader Election, Offsets, Rebalance, Delivery Semantics, and Transactions

This article provides a comprehensive overview of Kafka's architecture, covering topics, partitions, producer and consumer workflows, replication and leader election, offset management, consumer group coordination, rebalance processes, delivery semantics (at‑most‑once, at‑least‑once, exactly‑once), transactional messaging, and underlying file and configuration details.

Distributed MessagingExactly-OnceKafka
0 likes · 16 min read
Kafka Architecture Overview: Topics, Partitions, Producers, Consumers, Replication, Leader Election, Offsets, Rebalance, Delivery Semantics, and Transactions
Youzan Coder
Youzan Coder
Apr 12, 2019 · Industry Insights

How Youzan Scaled Its Log Platform to Handle Billions of Daily Logs

This article details Youzan's evolution from a simple Flume‑based log collector to a multi‑tenant, Kafka‑buffered, Spark‑processed, HBase‑backed logging architecture that now handles hundreds of billions of log entries per day, highlighting challenges, design decisions, and future improvements.

Distributed SystemsElasticsearchHBase
0 likes · 10 min read
How Youzan Scaled Its Log Platform to Handle Billions of Daily Logs
Java Captain
Java Captain
Apr 9, 2019 · Big Data

Kafka FAQs: Zookeeper Dependency, Retention Policies, Cleanup Rules, Performance Bottlenecks, and Cluster Best Practices

This article answers common Kafka questions, explaining why Kafka cannot operate without Zookeeper, describing its two retention strategies based on time and size, detailing how simultaneous time‑ and size‑based cleanup works, listing performance bottlenecks, and offering practical guidelines for sizing and configuring Kafka clusters.

Big DataCluster DesignKafka
0 likes · 2 min read
Kafka FAQs: Zookeeper Dependency, Retention Policies, Cleanup Rules, Performance Bottlenecks, and Cluster Best Practices