Tagged articles

1273 articles

Page 11 of 13

Jan 2, 2020 · Operations

Performance Bottleneck Analysis and Optimization of an Erlang Service with High CPU Usage

The article details a performance bottleneck investigation of an Erlang‑based service experiencing high CPU usage, describing the use of recon tools, pressure testing, analysis of Kafka and Nginx impacts, and the subsequent optimizations that doubled throughput to meet business requirements.

CPUErlangKafka

0 likes · 10 min read

Performance Bottleneck Analysis and Optimization of an Erlang Service with High CPU Usage

Mafengwo Technology

Jan 2, 2020 · Big Data

How We Scaled Kafka for Real‑Time Big Data at Mafengwo: Lessons and Practices

This article details Mafengwo's practical experience using Kafka within its big‑data platform, covering application scenarios, evolution through version upgrades, resource isolation, security and monitoring enhancements, and future plans for data duplication handling and consumer throttling.

Big DataData StreamingKafka

0 likes · 16 min read

How We Scaled Kafka for Real‑Time Big Data at Mafengwo: Lessons and Practices

Architecture Digest

Jan 1, 2020 · Fundamentals

Message Queue Middleware: Concepts, Application Scenarios, and JMS Programming Model

This article explains the role of message‑queue middleware in distributed systems, describes common use cases such as asynchronous processing, application decoupling, traffic shaping, log handling and messaging, and provides an overview of popular brokers and the JMS programming model.

JMSKafkaRabbitMQ

0 likes · 19 min read

Message Queue Middleware: Concepts, Application Scenarios, and JMS Programming Model

Big Data Technology & Architecture

Dec 22, 2019 · Big Data

Implementing Multi‑threaded Kafka Consumer and Producer with Partition Management

This article explains how to build a multi‑threaded Kafka consumer and producer in Java, covering partition concepts, consumer group offsets, thread‑pool configuration, and code examples that demonstrate proper use of Kafka streams, partition keys, and batch message sending for improved throughput.

Big DataConsumerKafka

0 likes · 15 min read

Implementing Multi‑threaded Kafka Consumer and Producer with Partition Management

Big Data Technology & Architecture

Dec 21, 2019 · Big Data

Kafka Offset Management and Replication Mechanisms Explained

This article provides a comprehensive technical overview of Kafka's offset handling, covering the request entry point, in‑memory offset sources, offset commit and fetch implementations, file storage layout, and the leader‑follower synchronization process that ensures data replication and high‑watermark updates.

Big DataDistributed SystemsHigh Watermark

0 likes · 16 min read

Kafka Offset Management and Replication Mechanisms Explained

Java High-Performance Architecture

Dec 17, 2019 · Backend Development

Understanding Kafka Topic Architecture: Partitions, Replication, and Failover

This article explains Kafka's topic architecture, detailing how topics are split into partitions for scalability and parallelism, the role of logs, key-based and round-robin partitioning, replication with leaders, followers, ISR, and how these mechanisms enable fault‑tolerance and high‑performance consumer failover.

BackendKafkaPartition

0 likes · 7 min read

Understanding Kafka Topic Architecture: Partitions, Replication, and Failover

Big Data Technology & Architecture

Dec 9, 2019 · Big Data

Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees

This article explains how to develop a real‑time ETL application using Apache Flink that reads events from Kafka, partitions them by event time into HDFS directories, and achieves exactly‑once processing through checkpointing, custom bucket assigners, and proper state backend configuration.

Apache FlinkBig DataExactly-Once

0 likes · 11 min read

Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees

Java High-Performance Architecture

Dec 7, 2019 · Backend Development

How Zookeeper Powers Kafka: Key Roles Explained

This article explains how Zookeeper functions as an essential part of Kafka by managing broker status, controller election, quotas, ISR tracking, node and topic registration, as well as consumer offset storage and registration, providing a comprehensive overview for interview preparation.

BrokerConsumerKafka

0 likes · 4 min read

How Zookeeper Powers Kafka: Key Roles Explained

Architecture Digest

Nov 25, 2019 · Big Data

Introduction to Apache Kafka: Core Concepts, Architecture, and APIs

This article provides a comprehensive overview of Apache Kafka, covering its fundamental capabilities, typical use cases, core components, key APIs, and essential concepts such as topics, partitions, segments, brokers, producers, and consumers, illustrated with diagrams.

APIsBig DataDistributed Systems

0 likes · 8 min read

Introduction to Apache Kafka: Core Concepts, Architecture, and APIs

Big Data Technology & Architecture

Nov 24, 2019 · Big Data

Common Apache Kafka Exceptions and Their Causes

This article lists frequent Apache Kafka exceptions such as UnknownTopicOrPartitionException, LEADER_NOT_AVAILABLE, NotLeaderForPartitionException, TimeoutException, RecordTooLargeException, and others, explaining each error message, typical reasons, and practical troubleshooting steps for producers and consumers.

Big DataConsumerError Handling

0 likes · 5 min read

Common Apache Kafka Exceptions and Their Causes

Architect's Tech Stack

Nov 21, 2019 · Backend Development

Comprehensive Guide to Spring‑Kafka: Integration, Configuration, and Advanced Features

This article provides a thorough tutorial on using Spring‑Kafka, covering basic Maven setup, producer and consumer code, embedded Kafka for testing, programmatic topic creation, synchronous and asynchronous message sending, transaction handling, request‑reply patterns, advanced @KafkaListener options, manual acknowledgments, lifecycle control, message forwarding with @SendTo, and retry with dead‑letter queues.

KafkaMessagingMicroservices

0 likes · 16 min read

Comprehensive Guide to Spring‑Kafka: Integration, Configuration, and Advanced Features

G7 EasyFlow Tech Circle

Nov 21, 2019 · Big Data

How G7 Combines AI, Big Data, and IoT to Transform Logistics

This article presents a detailed overview of G7's AI‑plus‑Big‑Data‑plus‑IoT platform for logistics, describing its neutral open architecture, real‑time data pipelines using Kafka and Flink, Lambda‑style storage in HBase/Hive, and the resulting safety‑insurance and analytics capabilities.

AIFlinkIoT

0 likes · 10 min read

How G7 Combines AI, Big Data, and IoT to Transform Logistics

Java Captain

Nov 20, 2019 · Backend Development

Exploring Advanced Features of Spring‑Kafka: Integration, Embedded Server, Topic Management, Transactions, and Message Handling

This article provides a comprehensive guide to using Spring‑Kafka, covering simple integration, embedded Kafka for testing, creating topics programmatically, advanced KafkaTemplate usage, transactional messaging, ReplyingKafkaTemplate for request‑reply, listener configurations, manual acknowledgments, lifecycle control, message forwarding, and retry with dead‑letter queues.

Embedded KafkaKafkaMessage Queue

0 likes · 19 min read

Exploring Advanced Features of Spring‑Kafka: Integration, Embedded Server, Topic Management, Transactions, and Message Handling

Architects Research Society

Nov 19, 2019 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

This article explains the design goals, exactly‑once semantics, Java transaction API, internal components such as the transaction coordinator and log, data‑flow interactions, performance considerations, and practical tips for using Apache Kafka transactions in stream‑processing applications.

Distributed SystemsExactly-OnceKafka

0 likes · 15 min read

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

Programmer DD

Nov 19, 2019 · Backend Development

17‑Point Comparison: Kafka vs RabbitMQ vs ZeroMQ vs RocketMQ vs ActiveMQ

This article provides a comprehensive 17‑point comparison of five popular message queue systems—Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ—covering documentation, supported languages, protocols, storage, transactions, load balancing, clustering, management UI, availability, message duplication, throughput, subscription models, ordering, acknowledgments, replay, retries, concurrency, and more.

ActiveMQBackendKafka

0 likes · 23 min read

17‑Point Comparison: Kafka vs RabbitMQ vs ZeroMQ vs RocketMQ vs ActiveMQ

Qunar Tech Salon

Nov 18, 2019 · Databases

Data Synchronization Architecture and Refactoring for Large-Scale Travel Data at Qunar

This article describes the challenges of handling billions of travel records in Qunar's MySQL databases, compares open‑source data sync solutions like Databus and Canal, outlines the legacy system’s issues, and presents a refactored architecture that introduces Otter, ES gateway, and improved aggregation to achieve low‑latency, reliable, and scalable data synchronization.

ETLElasticsearchKafka

0 likes · 19 min read

Data Synchronization Architecture and Refactoring for Large-Scale Travel Data at Qunar

Java High-Performance Architecture

Nov 12, 2019 · Backend Development

How Kafka Consumer Groups Boost Performance and Fault Tolerance

Kafka consumer groups enable multiple consumers to share partition workloads, ensuring exclusive consumption within a group, flexible consumption patterns like broadcast and unicast, and automatic fault‑tolerance through rebalancing, ultimately improving throughput, scalability, and resilience of streaming applications.

Kafkabackend-developmentconsumer groups

0 likes · 4 min read

How Kafka Consumer Groups Boost Performance and Fault Tolerance

Big Data Technology & Architecture

Nov 11, 2019 · Big Data

Connecting Apache Kafka with Flink 1.9 – Overview, Compatibility, and Code Samples

This article explains how to use Flink 1.9's built‑in Kafka connector, covering supported versions, Maven dependencies, consumer and producer configuration in Java and Scala, checkpointing, offset handling, partition discovery, timestamps, watermarks, and provides a complete runnable example.

ConnectorFlinkKafka

0 likes · 12 min read

Connecting Apache Kafka with Flink 1.9 – Overview, Compatibility, and Code Samples

Big Data Technology & Architecture

Nov 7, 2019 · Big Data

Real‑time Dashboard with Flink: Streaming Order Data, Site Metrics, and Top‑N Merchandise Rankings

This article demonstrates how to build a one‑second‑refresh real‑time dashboard for e‑commerce order data using Apache Flink, Kafka, and Redis, covering JSON message parsing, processing‑time windows, stateful aggregation for site‑level KPIs, and efficient top‑N product ranking via Redis sorted sets.

DashboardFlinkKafka

0 likes · 11 min read

Real‑time Dashboard with Flink: Streaming Order Data, Site Metrics, and Top‑N Merchandise Rankings

DataFunTalk

Nov 7, 2019 · Big Data

Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans

This article details Beike's real‑time computing engine, covering its background, streaming platform built on Spark Streaming and Flink, data ingestion via Kafka, metadata handling, SQL‑based task development, monitoring, storage solutions, and future roadmap for resource management and AI‑enhanced monitoring.

Big DataFlinkKafka

0 likes · 14 min read

Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans

Java Backend Technology

Nov 6, 2019 · Backend Development

Kafka vs RabbitMQ vs ZeroMQ vs RocketMQ vs ActiveMQ: Comprehensive Feature Comparison

This article systematically compares Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ across documentation availability, supported programming languages, protocols, storage models, transaction capabilities, load‑balancing mechanisms, clustering approaches, management interfaces, availability, duplicate handling, throughput, subscription patterns, ordering guarantees, acknowledgment strategies, replay options, retry mechanisms, and concurrency limits, helping engineers choose the right message queue for their needs.

ActiveMQKafkaMessage Queue Comparison

0 likes · 24 min read

Kafka vs RabbitMQ vs ZeroMQ vs RocketMQ vs ActiveMQ: Comprehensive Feature Comparison

Architecture Digest

Nov 5, 2019 · Big Data

Architecture Overview of Taobao, Meituan, and Didi Big Data Platforms

This article examines the big‑data architectures of three leading Chinese internet companies—Taobao, Meituan, and Didi—detailing their data sources, synchronization mechanisms, batch and streaming processing layers, and the common scheduling components that unify their Hadoop‑based ecosystems.

Big DataData ArchitectureDidi

0 likes · 7 min read

Architecture Overview of Taobao, Meituan, and Didi Big Data Platforms

Big Data Technology & Architecture

Oct 30, 2019 · Big Data

Building a Real‑Time Data Processing Pipeline with Apache Kafka, Spark Streaming, and Cassandra

This tutorial explains how to create a highly scalable, fault‑tolerant real‑time data processing platform by configuring a Kafka topic, a Cassandra keyspace, adding Spark and connector dependencies, developing a Java‑based Spark Streaming pipeline, enabling checkpoints, and deploying the application with spark‑submit.

Big DataKafkaReal-Time

0 likes · 8 min read

Building a Real‑Time Data Processing Pipeline with Apache Kafka, Spark Streaming, and Cassandra

DataFunTalk

Oct 25, 2019 · Big Data

Migrating Data from HBase to Kafka Using MapReduce

This article explains how to reverse the typical data flow by extracting massive Rowkeys from HBase with MapReduce, storing them on HDFS, and then using batch Get operations to retrieve the full records and write them into Kafka, while handling retries and monitoring progress.

Big DataData MigrationHBase

0 likes · 9 min read

Migrating Data from HBase to Kafka Using MapReduce

Alibaba Cloud Native

Oct 25, 2019 · Cloud Native

How to Push Messages with Alibaba Cloud Kafka in Knative: A Step‑by‑Step Guide

This tutorial explains how to create an Alibaba Cloud Kafka instance, configure topics and consumer groups, deploy the Knative Kafka addon, define a Knative Service and KafkaSource, and verify message delivery using CloudEvents, enabling serverless event‑driven applications.

Alibaba CloudEvent-drivenKafka

0 likes · 6 min read

How to Push Messages with Alibaba Cloud Kafka in Knative: A Step‑by‑Step Guide

Architects Research Society

Oct 22, 2019 · Big Data

Continuous Delivery of Event Streaming Pipelines with Spring Cloud Data Flow

This article explains how to build, deploy, and continuously update event streaming pipelines using Spring Cloud Data Flow and Apache Kafka, covering common topologies, named destinations, parallel and partitioned streams, function composition, multiple input/output bindings, and practical shell commands for registration and management.

Continuous DeploymentEvent StreamingKafka

0 likes · 19 min read

Continuous Delivery of Event Streaming Pipelines with Spring Cloud Data Flow

dbaplus Community

Oct 20, 2019 · Big Data

Mastering Kafka: Concepts, Installation, Optimization, and Security

This comprehensive guide covers Kafka's core concepts, design principles, installation steps, configuration tweaks, performance optimizations, permission management, common operational commands, cluster scaling, log retention settings, and monitoring scripts to help you build and maintain a robust Kafka ecosystem.

Big DataConfigurationInstallation

0 likes · 20 min read

Mastering Kafka: Concepts, Installation, Optimization, and Security

Selected Java Interview Questions

Oct 17, 2019 · Backend Development

Ensuring Message Order in MQ Systems: Interview Analysis and Practical Solutions

The article analyzes a common interview question about guaranteeing message ordering in message‑queue systems, explains why order matters using MySQL binlog and MQ examples, illustrates scenarios where ordering breaks in RabbitMQ and Kafka, and proposes concrete architectural solutions.

BackendKafkaMessage Queue

0 likes · 5 min read

Ensuring Message Order in MQ Systems: Interview Analysis and Practical Solutions

JD Retail Technology

Oct 14, 2019 · Databases

Overview of JDNoSQL Platform and Its Real-Time Advertising Use Cases

The article introduces JDNoSQL, a distributed column‑oriented key‑value store built on HDFS, outlines its core features, describes various business scenarios including real‑time ad computation, details the system architecture with Kafka and Flink, and presents table designs for ad impression and click statistics.

Big DataFlinkKafka

0 likes · 13 min read

Overview of JDNoSQL Platform and Its Real-Time Advertising Use Cases

Big Data Technology & Architecture

Oct 13, 2019 · Big Data

Building a Simple Canal-to-Kafka Demo with Maven Dependencies and Java Code

This guide introduces the canal‑kafka integration package, outlines its constraints, and provides a step‑by‑step tutorial with Maven dependencies and Java source code for a SimpleCanalClient, a Kafka producer, and a server class, enabling a functional demo of Canal to Kafka data streaming.

Big DataCanalData Integration

0 likes · 8 min read

Building a Simple Canal-to-Kafka Demo with Maven Dependencies and Java Code

UCloud Tech

Oct 11, 2019 · Big Data

Real‑Time Student Performance Analytics with Flink and Spark

This article demonstrates how to build a real‑time education analytics system by streaming answer data through Kafka into Flink or Spark, performing per‑question, per‑grade, and per‑subject aggregations, and optionally accelerating development with UFlink SQL.

Education AnalyticsFlinkKafka

0 likes · 17 min read

Real‑Time Student Performance Analytics with Flink and Spark

MaGe Linux Operations

Oct 9, 2019 · Operations

How to Build a Real‑Time Nginx Log Analytics Pipeline with ELK, Kafka, and Filebeat

This guide walks through setting up an end‑to‑end log collection and analysis solution for Nginx using ELK (Elasticsearch, Logstash, Kibana), Filebeat, and Kafka, covering service introduction, architecture design, Linux system preparation, configuration of each component, and visualisation in Kibana.

ELKFilebeatKafka

0 likes · 14 min read

How to Build a Real‑Time Nginx Log Analytics Pipeline with ELK, Kafka, and Filebeat

dbaplus Community

Oct 8, 2019 · Big Data

How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases

This article shares a senior data‑platform engineer's hands‑on experience managing dozens of thousand‑node clusters, detailing nine common cluster problems and step‑by‑step solutions—including performance tuning, RPC fixes, HDFS cleanup, Hive metadata repair, Spark shuffle optimization, HBase region recovery, and Kafka bottleneck mitigation.

Big DataCluster ManagementHBase

0 likes · 17 min read

How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases

Big Data Technology & Architecture

Oct 8, 2019 · Big Data

Real‑time MySQL Binlog Capture and Offline Hive Restoration for Data Warehouse Production

This article describes a complete solution that uses Alibaba's Canal for real‑time MySQL binlog collection, Kafka for transport, and a customized Camus pipeline to load and merge binlog data into Hive, addressing performance, consistency, and delete‑event challenges in large‑scale data warehousing.

BinlogCamusCanal

0 likes · 12 min read

Real‑time MySQL Binlog Capture and Offline Hive Restoration for Data Warehouse Production

Big Data Technology & Architecture

Sep 28, 2019 · Big Data

Two-Phase Commit (2PC) in Flink: Mechanism, Implementation, and Kafka Integration

This article explains the fundamentals of the two‑phase commit protocol, details its two stages (prepare and commit), discusses its advantages and drawbacks, and shows how Apache Flink implements 2PC for exactly‑once semantics with Kafka using the TwoPhaseCommitSinkFunction and related code examples.

Distributed SystemsFlinkKafka

0 likes · 9 min read

Two-Phase Commit (2PC) in Flink: Mechanism, Implementation, and Kafka Integration

Big Data Technology & Architecture

Sep 26, 2019 · Big Data

Comparing Apache Pulsar and Kafka: Messaging Models, Subscriptions, Acknowledgment, and Retention

This article compares Apache Pulsar and Kafka, explaining their messaging models, queue versus stream use cases, subscription types, acknowledgment mechanisms, and message retention/TTL features to help readers choose a high‑performance, highly available streaming platform.

Apache PulsarKafkaMessage Acknowledgment

0 likes · 10 min read

Comparing Apache Pulsar and Kafka: Messaging Models, Subscriptions, Acknowledgment, and Retention

Architecture Digest

Sep 24, 2019 · Big Data

Implementation Principles and Architecture of DBus Data Sources (RDBMS and Log Types)

The article explains how DBus ingests data from relational databases and log sources by detailing its extractor, incremental conversion, and full‑pull modules, the use of Canal and Kafka, rule‑based log structuring, the unified UMS message format, and heartbeat monitoring for reliability.

CanalDBusKafka

0 likes · 13 min read

Implementation Principles and Architecture of DBus Data Sources (RDBMS and Log Types)

Big Data Technology & Architecture

Sep 23, 2019 · Backend Development

Design and Evolution of Feed Stream Architecture for High‑Throughput Applications

This article analyzes the business requirements, technical challenges, and mainstream architectural solutions for large‑scale feed streams, and proposes a step‑by‑step evolution path—from a simple push model using cloud Kafka and HBase to hybrid push‑pull and recommendation‑driven designs—suitable for startups and rapidly growing platforms.

BackendHBaseKafka

0 likes · 15 min read

Design and Evolution of Feed Stream Architecture for High‑Throughput Applications

Big Data Technology & Architecture

Sep 19, 2019 · Big Data

Building a Real‑Time ETL Pipeline with Apache Flink and Ensuring Exactly‑once Semantics

This article demonstrates how to develop a real‑time ETL job using Apache Flink, covering project setup, Kafka as a source, custom bucket assigners for HDFS, checkpointing, savepoints, and deployment on YARN to achieve exactly‑once processing guarantees.

Apache FlinkBig DataExactly-Once

0 likes · 11 min read

Building a Real‑Time ETL Pipeline with Apache Flink and Ensuring Exactly‑once Semantics

Sohu Tech Products

Sep 11, 2019 · Backend Development

Design and Implementation of a High‑Concurrency Flash‑Sale System for Online Real‑Estate Opening

The article explains how to handle massive simultaneous user requests in a flash‑sale scenario by using rate limiting, caching, asynchronous processing, distributed locks, load balancing, and anti‑cheat mechanisms, illustrated with the Sohu Focus online opening system architecture.

Backend ArchitectureDistributed SystemsKafka

0 likes · 12 min read

Design and Implementation of a High‑Concurrency Flash‑Sale System for Online Real‑Estate Opening

Big Data Technology & Architecture

Sep 6, 2019 · Big Data

Big Data Development Interview Guide and Skill Tree Overview

This article provides a comprehensive interview roadmap for big data developers, outlining essential Java fundamentals, JVM internals, Linux basics, distributed theory, core frameworks such as Hadoop, Spark, Flink, Kafka, Netty, HBase, Hive, and practical algorithm topics, while also offering resume and career advice for aspiring candidates.

FlinkHadoopKafka

0 likes · 15 min read

Big Data Development Interview Guide and Skill Tree Overview

DataFunTalk

Sep 5, 2019 · Big Data

Apache Beam Architecture Principles and Practical Application

This article introduces Apache Beam as a unified programming model for batch and streaming data processing, explains its architecture, core components, advantages, extensibility, and demonstrates practical usage with KafkaIO, BeamSQL, and AIoT scenarios across multiple runners.

Apache BeamKafkaStreaming

0 likes · 16 min read

Apache Beam Architecture Principles and Practical Application

dbaplus Community

Sep 4, 2019 · Operations

Running Kafka on Kubernetes: Practical Tips, Pitfalls, and Best Practices

This guide explains how to run Kafka on Kubernetes, covering runtime resource needs, storage considerations, network requirements, configuration with Pods, StatefulSets, Helm charts and Operators, performance testing, monitoring, logging, health checks, rolling updates, scaling, and backup strategies.

KafkaKubernetesOps

0 likes · 12 min read

Running Kafka on Kubernetes: Practical Tips, Pitfalls, and Best Practices

dbaplus Community

Sep 3, 2019 · Backend Development

How We Built Real-Time MySQL-to-Elasticsearch Sync with Binlog and Kafka

To meet growing e‑commerce search demands, the team replaced a MySQL‑based intermediate table with a real‑time binlog‑driven pipeline that streams changes through Kafka into Elasticsearch, detailing design choices, ordering and completeness guarantees, custom modules, and monitoring for sub‑second sync latency.

BinlogElasticsearchKafka

0 likes · 13 min read

How We Built Real-Time MySQL-to-Elasticsearch Sync with Binlog and Kafka

Big Data Technology Architecture

Aug 26, 2019 · Big Data

Kafka Architecture and File Storage Mechanism: Design, Performance, and Operational Practices

This article provides a comprehensive overview of Kafka, covering its core features, use‑case scenarios, partition and replica design, file storage structure, consumer‑group coordination, delivery guarantees, performance optimizations, and the role of Zookeeper in managing the cluster.

Distributed MessagingKafkaReplication

0 likes · 54 min read

Kafka Architecture and File Storage Mechanism: Design, Performance, and Operational Practices

JD Retail Technology

Aug 23, 2019 · Databases

Design and Challenges of CB‑SQL Changefeed for Distributed Cloud‑Native Databases

The article explains CB‑SQL’s distributed changefeed architecture, its CDC implementation, the challenges of horizontal scalability and transactional ordering, and the innovative RangeFeed mechanism that enables ordered row‑level streams, resolved timestamps, and seamless integration with external systems like Kafka.

CB-SQLCDCChangefeed

0 likes · 13 min read

Design and Challenges of CB‑SQL Changefeed for Distributed Cloud‑Native Databases

HomeTech

Aug 15, 2019 · Big Data

Real‑Time Data Warehouse Development with Flink: Architecture, Implementation, and Lessons Learned

This article describes the motivation, technology selection, implementation details, and practical challenges of building a real‑time data warehouse using Flink, covering stream ingestion, data cleaning, dimension‑table joins, state backend choices, and operational lessons for large‑scale streaming pipelines.

FlinkKafkaState Backend

0 likes · 8 min read

Real‑Time Data Warehouse Development with Flink: Architecture, Implementation, and Lessons Learned

HomeTech

Aug 14, 2019 · Big Data

Real-Time Data Warehouse Development with Flink: Architecture, Implementation, and Lessons Learned

This article describes the motivation, technology selection, implementation details, and encountered challenges of building a real‑time data warehouse using Flink, covering streaming computation, code examples, dimension‑table caching, state backend choices, and best practices for production deployment.

FlinkKafkaState Backend

0 likes · 8 min read

Architecture Digest

Aug 14, 2019 · Big Data

Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model

Kafka is a distributed, partitioned, replicated messaging system originally developed by LinkedIn, offering high throughput, low latency, fault tolerance, and scalability; this article explains its core concepts, file storage design, partition replication, leader election, consumer groups, delivery guarantees, and operational considerations for big‑data pipelines.

Big DataDistributed SystemsKafka

0 likes · 56 min read

Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model

Big Data Technology & Architecture

Aug 13, 2019 · Big Data

How Kafka Leverages Linux Page Cache for High Throughput and Low Latency

This article explains why Kafka achieves remarkable speed by relying on Linux page cache, detailing the differences between page and buffer caches, Kafka's zero‑copy I/O path, relevant kernel parameters, and tuning recommendations for optimal backend performance.

I/OKafkaLinux

0 likes · 8 min read

How Kafka Leverages Linux Page Cache for High Throughput and Low Latency

360 Quality & Efficiency

Aug 8, 2019 · Big Data

An Introduction to Kafka: Architecture, Design Principles, and Common Issues

This article introduces Kafka, covering its definition, core concepts such as topics, partitions, offsets, producers and consumers, typical use cases, underlying design principles including message‑partition allocation and retention policies, processing mechanisms, and common troubleshooting questions for real‑world deployments.

Big DataDistributed MessagingKafka

0 likes · 7 min read

An Introduction to Kafka: Architecture, Design Principles, and Common Issues

Architecture Digest

Aug 8, 2019 · Big Data

Kafka Practical Guide: Concepts, Architecture, Configuration, Monitoring, and Management

This article provides a comprehensive overview of Kafka, covering its basic concepts, architecture, deployment, configuration, monitoring, producer and consumer settings, offset management, high availability, replication, leader election, and practical tips for deployment, tuning, and troubleshooting in production environments.

Distributed SystemsKafkaMessage Queue

0 likes · 37 min read

Kafka Practical Guide: Concepts, Architecture, Configuration, Monitoring, and Management

vivo Internet Technology

Aug 7, 2019 · Big Data

Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management

The article gives a thorough overview of Apache Kafka, explaining its core concepts, architecture, deployment steps, monitoring tools, and offset management, including broker and topic structures, producer/consumer APIs, replication, leader election, consumer groups, offset committing, and practical configuration and troubleshooting guidance.

Big DataKafkaMessaging

0 likes · 36 min read

Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management

Big Data Technology Architecture

Aug 5, 2019 · Big Data

Zookeeper in Distributed Systems: Roles in Kafka, Hadoop, HBase, and Solr

This article explains Zookeeper’s core concepts, its ZAB consensus protocol, and surveys its essential roles in major big‑data components such as Kafka, Hadoop, HBase, and Solr, illustrating how it provides configuration, naming, coordination, leader election, and high‑availability services across distributed architectures.

Distributed SystemsHBaseHadoop

0 likes · 5 min read

Zookeeper in Distributed Systems: Roles in Kafka, Hadoop, HBase, and Solr

Big Data Technology & Architecture

Aug 4, 2019 · Big Data

Apache Pulsar vs Apache Kafka: Architecture, Performance, and Advantages

This article compares Apache Kafka and Apache Pulsar, detailing Kafka's scalability challenges, Pulsar's architectural benefits, performance gains, multi‑tenant support, security features, and provides code examples and migration guidance for large‑scale streaming applications.

Apache PulsarBig DataDistributed Systems

0 likes · 11 min read

Apache Pulsar vs Apache Kafka: Architecture, Performance, and Advantages

NetEase Game Operations Platform

Aug 4, 2019 · Big Data

Log Classification and Real-Time Aggregation Architecture Using Flink and Kafka

This article describes a real‑time log‑classification pipeline built on Flink and Kafka that pre‑filters, structures, classifies, and aggregates heterogeneous logs, enabling efficient frequency‑based alerts and statistical analysis without storing raw log data at scale.

FlinkKafkaLog Processing

0 likes · 11 min read

Log Classification and Real-Time Aggregation Architecture Using Flink and Kafka

Big Data Technology Architecture

Aug 3, 2019 · Big Data

Kafka Architecture: Design Principles for High Throughput and Reliability

This article explains Kafka's design background, persistence mechanisms, disk sequential I/O optimizations, network and compression strategies, and stability features such as partitioning, replication, and ISR, illustrating how these techniques enable high‑throughput, low‑latency real‑time log processing in big‑data environments.

Disk I/OHigh ThroughputKafka

0 likes · 9 min read

Kafka Architecture: Design Principles for High Throughput and Reliability

NetEase Media Technology Team

Aug 2, 2019 · Backend Development

Delayed Message Queue Implementation: Use Cases, Comparison, and NetEase Open Course Practice

The article explains how delayed message queues replace inefficient scheduled‑task scans in distributed systems, outlines common use cases such as order timeouts and retries, compares RabbitMQ, RocketMQ, Kafka, ActiveMQ and Redis implementations, and details NetEase’s ActiveMQ‑based solution with idempotent processing and traceability.

ActiveMQDistributed SystemsKafka

0 likes · 13 min read

Delayed Message Queue Implementation: Use Cases, Comparison, and NetEase Open Course Practice

dbaplus Community

Jul 30, 2019 · Big Data

Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?

With the surge in real‑time data from sensors and devices, choosing the right streaming engine is critical; this article compares Apache Spark and Apache Flink—examining their architectures, micro‑batch vs continuous processing, strengths, limitations, and use‑case suitability for Kafka‑driven pipelines.

Big DataFlinkKafka

0 likes · 14 min read

Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?

Big Data Technology & Architecture

Jul 28, 2019 · Operations

Comprehensive Guide to Building an ELK Log Management Platform with Kafka and Filebeat

This article provides a detailed tutorial on designing, deploying, and operating an ELK log management platform—including Elasticsearch, Logstash, Kibana, Kafka, and Filebeat—covering architecture options, configuration files, command‑line operations, cluster setup, and best‑practice recommendations for scalable, real‑time log collection and analysis.

ELKElasticsearchFilebeat

0 likes · 22 min read

Comprehensive Guide to Building an ELK Log Management Platform with Kafka and Filebeat

ITPUB

Jul 26, 2019 · Backend Development

How to Gracefully Handle Kill Signals in Java Backend Processes

On Linux, this guide shows how to locate and terminate a Java background process using ps and kill, then explains why force‑killing can leave resources open, and provides a complete Java SignalHandler implementation to receive termination signals, safely close files, database connections, and Kafka consumers.

KafkaKill SignalLinux

0 likes · 8 min read

How to Gracefully Handle Kill Signals in Java Backend Processes

Big Data Technology & Architecture

Jul 17, 2019 · Backend Development

Ensuring Message Queue Consumption Order: Issues and Solutions for RabbitMQ and Kafka

This article explains why maintaining message order is critical, describes common scenarios that cause ordering problems in RabbitMQ and Kafka, and presents practical strategies such as queue partitioning, single‑consumer designs, and internal memory queues to guarantee ordered consumption.

KafkaMessage QueueRabbitMQ

0 likes · 5 min read

Ensuring Message Queue Consumption Order: Issues and Solutions for RabbitMQ and Kafka

Big Data Technology & Architecture

Jul 17, 2019 · Backend Development

Preventing Message Loss in RabbitMQ and Kafka: Principles, Scenarios, and Solutions

This article explains the core principles of message queues, outlines common data‑loss scenarios for RabbitMQ and Kafka, and provides practical techniques—such as transactions, confirm mode, persistence settings, and replication configurations—to ensure reliable, loss‑free messaging.

Data ReliabilityKafkaMessage Queue

0 likes · 9 min read

Preventing Message Loss in RabbitMQ and Kafka: Principles, Scenarios, and Solutions

Big Data Technology & Architecture

Jul 15, 2019 · Backend Development

Why Use Message Queues? Benefits, Drawbacks, and Comparison of Kafka, ActiveMQ, RabbitMQ, and RocketMQ

The article explains why message queues are employed for decoupling, asynchronous processing, and load‑shedding, outlines their advantages and disadvantages, and compares popular MQ products such as Kafka, ActiveMQ, RabbitMQ, and RocketMQ to guide technology selection.

AsynchronousBackend ArchitectureDecoupling

0 likes · 5 min read

Why Use Message Queues? Benefits, Drawbacks, and Comparison of Kafka, ActiveMQ, RabbitMQ, and RocketMQ

Big Data Technology Architecture

Jul 12, 2019 · Big Data

Why Kafka Is So Popular: Features, Use Cases, and Architecture Overview

This article explains why Apache Kafka has become a cornerstone of modern big‑data pipelines by detailing its high‑throughput, fault‑tolerant publish‑subscribe architecture, real‑time processing capabilities, extensive language support, scalability mechanisms, and the wide range of use cases adopted by leading enterprises.

Distributed StreamingKafkaMessage Queue

0 likes · 9 min read

Why Kafka Is So Popular: Features, Use Cases, and Architecture Overview

Mafengwo Technology

Jul 11, 2019 · Backend Development

How We Achieved Near‑Real‑Time MySQL‑to‑Elasticsearch Sync Using Binlog and Kafka

This article explains why traditional MySQL queries no longer meet the growing e‑commerce data needs, describes the limitations of a MySQL‑to‑Elasticsearch intermediate table, and details a binlog‑driven, Kafka‑based pipeline with custom modules, upsert handling, filtering, and monitoring to ensure fast, reliable data synchronization.

BackendBinlogElasticsearch

0 likes · 11 min read

How We Achieved Near‑Real‑Time MySQL‑to‑Elasticsearch Sync Using Binlog and Kafka

Big Data Technology Architecture

Jul 9, 2019 · Big Data

Kafka Best Practices for High Throughput: 20 Recommendations

This article presents New Relic's 20 best‑practice recommendations for operating Apache Kafka at high throughput, covering partitions, consumers, producers, and brokers, and explains key concepts, configuration tuning, monitoring, and architectural considerations to ensure reliable, scalable streaming pipelines.

BrokersConsumersHigh Throughput

0 likes · 14 min read

Kafka Best Practices for High Throughput: 20 Recommendations

Big Data Technology & Architecture

Jul 9, 2019 · Big Data

Understanding Flink State Management and Checkpointing for Exactly-Once Kafka Integration

This article explains how Apache Flink manages state, uses checkpointing for fault-tolerant recovery, and achieves exactly-once semantics when consuming Kafka streams by persisting offsets, describing the checkpoint mechanism, recovery process, and practical considerations for production deployments.

Big DataCheckpointFlink

0 likes · 8 min read

Understanding Flink State Management and Checkpointing for Exactly-Once Kafka Integration

dbaplus Community

Jul 4, 2019 · Databases

How Tencent’s TDSQL Multi‑Source Sync Achieves High‑Performance, Consistent Data Distribution

This article explains the financial‑industry driven requirements for real‑time data sync, describes the TDSQL‑MULTISRCSYNC architecture—including producer, store, and consumer components—and details core designs such as row‑hash concurrency, idempotent binlog handling, and a lock‑based ordering mechanism that ensure high throughput and consistency.

Database ReplicationIdempotencyKafka

0 likes · 13 min read

How Tencent’s TDSQL Multi‑Source Sync Achieves High‑Performance, Consistent Data Distribution

Efficient Ops

Jul 2, 2019 · Operations

How to Collect Nginx Logs with Rsyslog, Kafka, and ELK Without Agents

Learn how to set up agent‑less log collection for Nginx using Rsyslog, forward logs via the omkafka module to a Kafka cluster, and process them with Logstash into Elasticsearch for visualization in Kibana, including installation, configuration, and testing steps.

ELKKafkaNginx

0 likes · 15 min read

How to Collect Nginx Logs with Rsyslog, Kafka, and ELK Without Agents

Big Data Technology & Architecture

Jul 1, 2019 · Big Data

How to Ensure High Availability of Message Queues (RabbitMQ and Kafka)

This article explains the concept of high availability for message queues, analyzes interview expectations, and details the HA mechanisms of RabbitMQ (including single, normal cluster, and mirrored modes) and Kafka (partition replication and leader election), highlighting their advantages, drawbacks, and practical considerations.

Distributed SystemsKafkaMessage Queue

0 likes · 11 min read

How to Ensure High Availability of Message Queues (RabbitMQ and Kafka)

Beike Product & Technology

Jun 28, 2019 · Backend Development

EPX: Real-Time MySQL Change Capture and Kafka Sync Architecture

EPX is a high‑availability, high‑performance data pipeline that captures MySQL binlog changes in real time, parses and filters them, and streams unified JSON events to Kafka for downstream services, while providing monitoring, alerting, backup, and migration capabilities across many business units.

BackendKafkahigh availability

0 likes · 7 min read

EPX: Real-Time MySQL Change Capture and Kafka Sync Architecture

dbaplus Community

Jun 20, 2019 · Big Data

How Kafka Hits Million‑Message Throughput Using Page Cache and Zero‑Copy

Kafka achieves its ultra‑high throughput and low latency by writing data to the OS page cache, performing sequential disk writes, and employing zero‑copy techniques that eliminate unnecessary data copies during consumption, enabling tens of thousands to millions of messages per second.

Big DataHigh ThroughputKafka

0 likes · 8 min read

How Kafka Hits Million‑Message Throughput Using Page Cache and Zero‑Copy

Big Data Technology Architecture

Jun 18, 2019 · Big Data

How Kafka Guarantees Data Reliability and Consistency

This article explains Kafka's mechanisms for ensuring data reliability through partition replicas, producer acknowledgments, and leader election, and describes how the high‑water‑mark and ISR concepts maintain strong data consistency across the cluster.

ConsistencyData ReliabilityKafka

0 likes · 9 min read

How Kafka Guarantees Data Reliability and Consistency

Java Captain

Jun 15, 2019 · Backend Development

Why Message Queues Are Needed and an Introduction to Kafka

This article explains the motivations behind using message‑queue middleware, outlines its benefits such as decoupling, asynchronous processing and peak‑shaving, describes point‑to‑point and publish‑subscribe communication models, and provides a detailed overview of Kafka’s architecture, terminology, data flow, storage strategy, and consumer group mechanics.

BackendKafkaPublish-Subscribe

0 likes · 15 min read

Why Message Queues Are Needed and an Introduction to Kafka

dbaplus Community

Jun 13, 2019 · Backend Development

Kafka vs RabbitMQ vs ZeroMQ vs RocketMQ vs ActiveMQ: 17‑Point Comparison

This article provides a comprehensive 17‑point comparison of five popular message‑queue systems—Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ—covering documentation, language support, protocols, storage, transactions, load balancing, clustering, management UI, availability, duplication guarantees, throughput, subscription models, ordering, acknowledgements, replay, retry mechanisms, and concurrency characteristics.

ActiveMQKafkaRabbitMQ

0 likes · 26 min read

Kafka vs RabbitMQ vs ZeroMQ vs RocketMQ vs ActiveMQ: 17‑Point Comparison

Big Data Technology & Architecture

Jun 13, 2019 · Fundamentals

Comparison of Kafka and Pulsar Stream Consumption Models and Rebalance Mechanisms

The article explains Kafka's consumer‑group rebalance and Pulsar's unified queue/stream subscription models, compares their partition assignment strategies, and demonstrates both with Docker‑based Pulsar setups, Java consumer code, and practical failover and exclusive scenarios.

KafkaPulsarconsumer-group

0 likes · 6 min read

Comparison of Kafka and Pulsar Stream Consumption Models and Rebalance Mechanisms

Architecture Digest

Jun 10, 2019 · Backend Development

Design and Implementation of Zhihu's Long Connection Gateway

This article explains how Zhihu designed a scalable long‑connection gateway that decouples business logic via a publish‑subscribe model, implements ACL‑based authorization, ensures message reliability with acknowledgments and sliding windows, and leverages OpenResty, Redis, and Kafka for load‑balanced, fault‑tolerant backend services.

KafkaOpenRestyScalability

0 likes · 13 min read

Design and Implementation of Zhihu's Long Connection Gateway

360 Quality & Efficiency

Jun 6, 2019 · Big Data

An Overview of Kafka: Introduction, Design Principles, and Common Issues

This article introduces Kafka, explains its core concepts and design principles, outlines typical use cases, and discusses common operational problems and troubleshooting tips for this high‑throughput distributed messaging system.

Big DataDistributed SystemsKafka

0 likes · 9 min read

An Overview of Kafka: Introduction, Design Principles, and Common Issues

Big Data Technology Architecture

Jun 4, 2019 · Big Data

Understanding Kafka Exactly-Once Semantics, Idempotence, and Transactions

This article explains Kafka's Exactly-Once Semantics (EOS), the role of idempotence, and how transactional support works, covering EOS semantics, producer id and sequence numbers, configuration properties, and providing Java code examples for initializing, beginning, committing, and aborting transactions.

Exactly-OnceIdempotenceKafka

0 likes · 8 min read

Understanding Kafka Exactly-Once Semantics, Idempotence, and Transactions

DataFunTalk

Jun 3, 2019 · Big Data

Choosing a Real-Time Computing Engine Based on Kafka: Spark vs Flink

This article examines the need for real‑time computation, explains streaming versus real‑time concepts, and compares Apache Spark and Apache Flink—covering their architectures, micro‑batch and continuous processing, advantages, limitations, windowing, event‑time handling, and watermarks—to guide engine selection for Kafka‑driven workloads.

FlinkKafkaSpark

0 likes · 15 min read

Choosing a Real-Time Computing Engine Based on Kafka: Spark vs Flink

Mafengwo Technology

May 31, 2019 · Operations

How We Built a Scalable Monitoring & Alert System for Large‑Scale Transportation Services

This article explains how the team designed and implemented a unified monitoring and alert platform for a multi‑service transportation business, covering architecture, data collection, storage, rule engine, alert delivery, troubleshooting aids, encountered pitfalls, and future enhancements.

AlertingElasticsearchKafka

0 likes · 13 min read

How We Built a Scalable Monitoring & Alert System for Large‑Scale Transportation Services

Architect's Tech Stack

May 31, 2019 · Big Data

Kafka Architecture Overview: Producers, Consumers, Partitions, Replication, and Transactions

This article provides a comprehensive overview of Apache Kafka's architecture, covering topics such as producer and consumer workflows, partition and replica management, leader election, offset handling, message delivery semantics, transaction support, and file organization, illustrating how Kafka achieves high performance and scalability.

ConsumerDistributed SystemsKafka

0 likes · 18 min read

Kafka Architecture Overview: Producers, Consumers, Partitions, Replication, and Transactions

ITPUB

May 29, 2019 · Big Data

How to Build a Trillion-Scale Real-Time Data Platform: Lessons from DTCC 2019

In a DTCC 2019 keynote, Zhao Qun, director of big‑data platform at Percent Point, outlines the challenges of trillion‑scale real‑time analytics and presents a transparent, fine‑grained architecture built on Kafka, Spark Streaming, ClickHouse, HBase, Ceph and Elasticsearch, detailing design principles, component sizing, multi‑center deployment, performance testing and operational safeguards.

Big DataKafkaReal-time analytics

0 likes · 17 min read

How to Build a Trillion-Scale Real-Time Data Platform: Lessons from DTCC 2019

dbaplus Community

May 28, 2019 · Big Data

Mastering Kafka: Deep Dive into Architecture, Production, Consumption, and Transactions

This article provides a comprehensive technical guide to Kafka, covering its distributed architecture, producer and consumer workflows, partition and leader management, message delivery semantics, exactly‑once guarantees, transaction handling, file organization, and key configuration parameters.

Big DataKafkamessage queues

0 likes · 18 min read

Mastering Kafka: Deep Dive into Architecture, Production, Consumption, and Transactions

Architecture Digest

May 23, 2019 · Backend Development

Comprehensive Comparison of Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ

This article provides a detailed side‑by‑side analysis of five popular message‑queue systems—Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ—covering documentation, programming languages, protocols, storage, transactions, load balancing, clustering, management interfaces, availability, duplication handling, throughput, subscription models, ordering, acknowledgements, backtracking, retry mechanisms, and concurrency characteristics.

ActiveMQKafkaMessage Queue Comparison

0 likes · 21 min read

Comprehensive Comparison of Kafka, RabbitMQ, ZeroMQ, RocketMQ, and ActiveMQ

Big Data Technology & Architecture

May 22, 2019 · Big Data

Kafka Transactions: Implementation Details and End‑to‑End Process

This article explains how Apache Kafka implements transactional messaging, covering the atomic write guarantees across partitions, the role of TransactionCoordinator, transaction state management, producer and consumer handling, code examples, and the end‑to‑end workflow for exactly‑once semantics.

ConsumerDistributed SystemsKafka

0 likes · 42 min read

Kafka Transactions: Implementation Details and End‑to‑End Process

Big Data Technology & Architecture

May 20, 2019 · Big Data

Kafka Configuration, Monitoring, and Performance Optimization Best Practices

This article summarizes practical Kafka best‑practice guidelines covering hardware sizing, OS and JVM tuning, disk layout choices, replica and controller settings, broker and topic evaluation, as well as producer and consumer configuration, monitoring metrics, and strategies to prevent data loss.

KafkaStreamingbigdata

0 likes · 14 min read

Kafka Configuration, Monitoring, and Performance Optimization Best Practices

Big Data Technology Architecture

May 18, 2019 · Big Data

Key Concepts of Kafka, Hadoop Shuffle, Spark Cluster Modes, HDFS I/O, and Spark RDD Operations

This article explains Kafka message structure and offset retrieval, details Hadoop's map and reduce shuffle processes, outlines Spark's deployment modes, describes HDFS read/write mechanisms, compares reduceByKey and groupByKey performance, and discusses Spark streaming integration with Kafka and data loss prevention.

HDFSHadoopKafka

0 likes · 10 min read

Key Concepts of Kafka, Hadoop Shuffle, Spark Cluster Modes, HDFS I/O, and Spark RDD Operations

Big Data Technology & Architecture

May 17, 2019 · Backend Development

Understanding Kafka Producer Idempotence: PID, Sequence Numbers, and Implementation Details

This article explains how Apache Kafka implements producer idempotence by introducing Producer IDs (PID) and sequence numbers, describes the request‑response flow for PID allocation, details server‑side PID management, shows the exact‑once guarantee mechanism, and answers common configuration questions with code examples.

BackendIdempotenceKafka

0 likes · 32 min read

Understanding Kafka Producer Idempotence: PID, Sequence Numbers, and Implementation Details

Big Data Technology Architecture

May 17, 2019 · Big Data

Optimizing Real-Time Kafka Writes in Spark Streaming Using a Broadcasted KafkaProducer

To improve the performance of writing streaming data to Kafka, the article demonstrates how to replace per-partition KafkaProducer creation with a lazily-initialized, broadcasted producer in Scala, reducing overhead and achieving dozens‑fold speed gains.

Broadcast VariableKafkaScala

0 likes · 3 min read

Optimizing Real-Time Kafka Writes in Spark Streaming Using a Broadcasted KafkaProducer

NetEase Media Technology Team

May 16, 2019 · Backend Development

Design and Implementation of a Configurable, Extensible Content Processing System (Apollo)

Apollo is a configurable, extensible content‑processing platform that models each step as a node defined in a configuration file, supports multiple implementations for A/B testing, decouples producers and consumers via Kafka, ensures fault‑tolerant retries and replay, captures fine‑grained metrics through Canal‑to‑TiDB pipelines, and cuts new‑type development effort to roughly ten percent of the original cost while delivering high‑quality data to downstream teams.

Backend ArchitectureKafkaTiDB

0 likes · 9 min read

Design and Implementation of a Configurable, Extensible Content Processing System (Apollo)

Big Data Technology & Architecture

May 15, 2019 · Backend Development

How ByteDance Uses Kafka – Presentation by Gong Yunfei at the 2019 Apache Flink x Apache Kafka Conference

This article presents Gong Yunfei's 2019 talk from the Beijing Apache Flink x Apache Kafka conference, detailing how ByteDance leverages Kafka for its large‑scale streaming and data processing needs.

Apache FlinkByteDanceDistributed Systems

0 likes · 2 min read

How ByteDance Uses Kafka – Presentation by Gong Yunfei at the 2019 Apache Flink x Apache Kafka Conference

dbaplus Community

May 7, 2019 · Big Data

Why Kafka Achieves Million‑Level Throughput: Sequential Writes, mmap, and Zero‑Copy

This article explains how Kafka attains high throughput by using sequential disk writes, memory‑mapped files, sendfile zero‑copy, and batch compression, detailing both write and read path optimizations and their impact on performance.

Batch CompressionBig DataHigh Throughput

0 likes · 8 min read

Why Kafka Achieves Million‑Level Throughput: Sequential Writes, mmap, and Zero‑Copy

Architecture Digest

May 5, 2019 · Big Data

Kafka Architecture Overview: Topics, Partitions, Producers, Consumers, Replication, Leader Election, Offsets, Rebalance, Delivery Semantics, and Transactions

This article provides a comprehensive overview of Kafka's architecture, covering topics, partitions, producer and consumer workflows, replication and leader election, offset management, consumer group coordination, rebalance processes, delivery semantics (at‑most‑once, at‑least‑once, exactly‑once), transactional messaging, and underlying file and configuration details.

Distributed MessagingExactly-OnceKafka

0 likes · 16 min read

Kafka Architecture Overview: Topics, Partitions, Producers, Consumers, Replication, Leader Election, Offsets, Rebalance, Delivery Semantics, and Transactions

Java Backend Technology

Apr 17, 2019 · Backend Development

Why Kafka Sticks to Master‑Write/Read: The Real Reason It Doesn’t Support Read‑Write Separation

The article explains Kafka’s master‑write master‑read architecture, why it does not implement a master‑write slave‑read model, the trade‑offs of consistency and latency, and how Kafka achieves load balancing through its leader‑replica design and operational safeguards.

BackendKafkaMaster‑Slave

0 likes · 7 min read

Why Kafka Sticks to Master‑Write/Read: The Real Reason It Doesn’t Support Read‑Write Separation

Youzan Coder

Apr 12, 2019 · Industry Insights

How Youzan Scaled Its Log Platform to Handle Billions of Daily Logs

This article details Youzan's evolution from a simple Flume‑based log collector to a multi‑tenant, Kafka‑buffered, Spark‑processed, HBase‑backed logging architecture that now handles hundreds of billions of log entries per day, highlighting challenges, design decisions, and future improvements.

Distributed SystemsElasticsearchHBase

0 likes · 10 min read

How Youzan Scaled Its Log Platform to Handle Billions of Daily Logs

System Architect Go

Apr 11, 2019 · Big Data

Introduction to Apache Kafka: Core Concepts, Message Delivery, Partition Storage, and Consumption

This article introduces Apache Kafka as a distributed streaming platform, explaining its three core capabilities, key concepts such as producers, topics, brokers, partitions and consumers, and detailing how messages are delivered, stored in partitions, and consumed by consumer groups.

Big DataDistributed StreamingKafka

0 likes · 8 min read

Introduction to Apache Kafka: Core Concepts, Message Delivery, Partition Storage, and Consumption

Java Captain

Apr 9, 2019 · Big Data

Kafka FAQs: Zookeeper Dependency, Retention Policies, Cleanup Rules, Performance Bottlenecks, and Cluster Best Practices

This article answers common Kafka questions, explaining why Kafka cannot operate without Zookeeper, describing its two retention strategies based on time and size, detailing how simultaneous time‑ and size‑based cleanup works, listing performance bottlenecks, and offering practical guidelines for sizing and configuring Kafka clusters.

Big DataCluster DesignKafka

0 likes · 2 min read

Kafka FAQs: Zookeeper Dependency, Retention Policies, Cleanup Rules, Performance Bottlenecks, and Cluster Best Practices

MaGe Linux Operations

Mar 30, 2019 · Information Security

How to Build a Real-Time Security Log Collection and Alert System with ELK, Kafka, and Flume

This guide walks through setting up a comprehensive security log collection pipeline—covering WAF, firewall, and Nginx logs—using ELK, Logstash, Kafka, and Flume, and then configuring real‑time alerts with Sentinl or ElastAlert integrated with DingTalk and email notifications.

AlertingELKFlume

0 likes · 16 min read

How to Build a Real-Time Security Log Collection and Alert System with ELK, Kafka, and Flume