Tagged articles
1273 articles
Page 9 of 13
JD Tech Talk
JD Tech Talk
Feb 5, 2021 · Big Data

Design and Implementation of a Real‑Time OLAP Engine Using ClickHouse in JD Energy Management Platform

This article describes how JD's Energy Management Platform leverages ClickHouse as a high‑performance, MPP‑based OLAP engine to provide real‑time, multi‑dimensional analytics on IoT energy data, covering business background, technology selection, system architecture, data ingestion, storage, replication, and a generic query interface with code examples.

KafkaOLAPclickhouse
0 likes · 11 min read
Design and Implementation of a Real‑Time OLAP Engine Using ClickHouse in JD Energy Management Platform
Code Ape Tech Column
Code Ape Tech Column
Jan 21, 2021 · Interview Experience

Master Distributed System Interview Questions: CAP, Redis, Zookeeper, Kafka and More

This article compiles essential interview‑style questions and detailed answers on distributed system fundamentals—including CAP and BASE theories, consistency models, distributed transactions, Redis features and persistence, Zookeeper coordination, Kafka architecture, and common design patterns for high‑concurrency scenarios.

Distributed SystemsKafkaMessage Queue
0 likes · 38 min read
Master Distributed System Interview Questions: CAP, Redis, Zookeeper, Kafka and More
Code Ape Tech Column
Code Ape Tech Column
Jan 19, 2021 · Operations

Scaling Kafka Clusters to Support Millions of Partitions: Challenges and Solutions

This article examines the technical challenges of scaling Kafka clusters to handle millions of partitions—including Zookeeper node explosion, replication overhead, controller recovery latency, and broker restart delays—and proposes solutions such as parallel ZK fetching, metadata synchronization via internal topics, logical cluster composition, and physical cluster splitting.

Distributed SystemsKafkacluster operations
0 likes · 13 min read
Scaling Kafka Clusters to Support Millions of Partitions: Challenges and Solutions
21CTO
21CTO
Jan 16, 2021 · Backend Development

How to Build a Go‑Based Log Collection System with etcd, Context, and Kafka

This article walks through designing and implementing a Go log‑collection agent that uses etcd for configuration storage, context for timeout and metadata handling, and Kafka for message consumption, complete with code examples, setup instructions, and a rate‑limiting utility.

GoKafkacontext
0 likes · 16 min read
How to Build a Go‑Based Log Collection System with etcd, Context, and Kafka
Didi Tech
Didi Tech
Jan 14, 2021 · Cloud Computing

Design and Implementation of Didi's Logi‑KafkaManager Multi‑tenant Kafka Cloud Platform

Didi’s Logi‑KafkaManager is a multi‑tenant Kafka cloud platform that consolidates dozens of clusters into a secure, isolated gateway‑driven service offering intuitive web‑based topic management, real‑time metrics visualization, automated diagnostics, quota governance and safe scaling, delivering high internal satisfaction and enterprise commercialization.

Big DataKafkacloud platform
0 likes · 17 min read
Design and Implementation of Didi's Logi‑KafkaManager Multi‑tenant Kafka Cloud Platform
Meituan Technology Team
Meituan Technology Team
Jan 14, 2021 · Big Data

Design and Implementation of an SSD‑Based Application‑Layer Cache Architecture for Kafka in Meituan Data Platform

Meituan built an SSD‑based application‑layer cache for Kafka that bypasses PageCache contention between real‑time and delayed jobs, classifies log segments across SSD and HDD, limits flush rates, and achieves up to 80% latency reduction while guaranteeing stable real‑time consumption.

Big DataKafkaLogSegment
0 likes · 19 min read
Design and Implementation of an SSD‑Based Application‑Layer Cache Architecture for Kafka in Meituan Data Platform
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Jan 14, 2021 · Big Data

How Yidun Achieves Real-Time, High-Performance Public-Opinion Data Cleaning with Groovy and JVM

Yidun’s public-opinion monitoring platform transforms massive raw web data into a unified format by separating dynamic Groovy-script-driven cleaning from static processing, achieving real-time source integration, high throughput, scalability, and high availability while addressing format diversity, team coordination, and performance-flexibility trade-offs.

Big DataETLGroovy
0 likes · 5 min read
How Yidun Achieves Real-Time, High-Performance Public-Opinion Data Cleaning with Groovy and JVM
Architect's Tech Stack
Architect's Tech Stack
Jan 8, 2021 · Backend Development

Comprehensive Guide to Spring Kafka: Integration, Advanced Features, and Usage

This article provides a detailed tutorial on integrating Kafka with Spring using Spring‑Kafka, covering simple setup, embedded Kafka testing, topic creation, message sending and receiving, transaction support, listener configurations, manual acknowledgment, error handling, retry and dead‑letter queues, and related code examples.

KafkaMessagingMicroservices
0 likes · 21 min read
Comprehensive Guide to Spring Kafka: Integration, Advanced Features, and Usage
dbaplus Community
dbaplus Community
Jan 5, 2021 · Big Data

How Ctrip Built a Scalable Unified Log Framework for Payment Data

Facing massive, heterogeneous logs from numerous payment services, Ctrip’s data team designed a unified logging framework that extends log4j2, streams logs via Kafka to HDFS using a customized Camus pipeline, partitions and stores data in ORC for efficient Hive analysis, while addressing format, storage, and performance challenges.

Big DataCamusHadoop
0 likes · 16 min read
How Ctrip Built a Scalable Unified Log Framework for Payment Data
Programmer DD
Programmer DD
Jan 5, 2021 · Backend Development

Understanding Kafka Partition Assignment: Strategies and Code Walkthrough

This article explains how Kafka determines which partition a producer sends a record to, how partition counts are configured, and how consumer groups assign partitions using the default, range, and round‑robin strategies, complemented by detailed Java code examples.

KafkaPartition AssignmentProducer
0 likes · 21 min read
Understanding Kafka Partition Assignment: Strategies and Code Walkthrough
JavaEdge
JavaEdge
Jan 1, 2021 · Backend Development

Inside Kafka’s Network Stack: How SocketServer, Acceptor, and Processor Work

This article breaks down Kafka’s network communication layer, detailing the roles of SocketServer, the Acceptor thread, Processor threads, and related classes such as RequestChannel, KafkaRequestHandlerPool, and key configuration parameters, while illustrating their interactions with diagrams.

BackendKafkaReactor
0 likes · 7 min read
Inside Kafka’s Network Stack: How SocketServer, Acceptor, and Processor Work
Top Architect
Top Architect
Dec 30, 2020 · Backend Development

Using Kafka as a Storage System for Twitter’s Account Activity Replay API

The article explains how Twitter built the Account Activity Replay API by repurposing Kafka as a storage layer, detailing the system’s architecture, partitioning strategy, request handling, deduplication, and performance optimizations to provide reliable event recovery for developers.

InfrastructureKafkaTwitter
0 likes · 8 min read
Using Kafka as a Storage System for Twitter’s Account Activity Replay API
Code Ape Tech Column
Code Ape Tech Column
Dec 30, 2020 · Industry Insights

Why Does a Single Kafka Broker Failure Bring Down Your Consumers?

The article explains Kafka's high‑availability architecture, covering multi‑replica redundancy, ISR mechanisms, producer acknowledgment settings, and a real‑world case where a broker crash halted consumption due to the __consumer_offsets topic's replication factor, then offers concrete remediation steps.

Consumer OffsetsISRKafka
0 likes · 10 min read
Why Does a Single Kafka Broker Failure Bring Down Your Consumers?
Selected Java Interview Questions
Selected Java Interview Questions
Dec 27, 2020 · Operations

Kafka Outage and High Availability Mechanisms

This article examines a Kafka outage scenario in a fintech company, explains Kafka’s multi-replica redundancy design, leader‑follower architecture, ISR mechanism, and how misconfiguration of the __consumer_offset topic can cause cluster-wide consumer failures, and provides solutions to ensure true high availability.

ACKConsumer OffsetISR
0 likes · 10 min read
Kafka Outage and High Availability Mechanisms
Code Ape Tech Column
Code Ape Tech Column
Dec 25, 2020 · Backend Development

RabbitMQ vs Kafka: Which Messaging System Wins for Your Architecture?

This article compares RabbitMQ and Apache Kafka by examining their internal designs, messaging models, ordering guarantees, routing, timing, retention, fault‑tolerance, scalability, and consumer complexity, then provides concrete guidance on when to choose each technology for real‑world systems.

ComparisonKafkaMessage Queue
0 likes · 24 min read
RabbitMQ vs Kafka: Which Messaging System Wins for Your Architecture?
macrozheng
macrozheng
Dec 15, 2020 · Big Data

How Kafka Achieves Million‑TPS Through Sequential I/O, MMAP, and Zero‑Copy

Kafka can sustain millions of transactions per second by writing data sequentially to disk, leveraging memory‑mapped files, employing zero‑copy DMA transfers, and batching messages, each technique reducing I/O overhead and CPU involvement, which together enable its high‑throughput performance in big‑data pipelines.

Big DataHigh ThroughputKafka
0 likes · 11 min read
How Kafka Achieves Million‑TPS Through Sequential I/O, MMAP, and Zero‑Copy
Programmer DD
Programmer DD
Dec 2, 2020 · Backend Development

How Kafka Uses a Timing Wheel for Efficient Timeout Handling

Kafka handles many requests that require asynchronous processing or waiting for conditions by attaching a timeout parameter; if the condition isn’t met within the timeout, Kafka returns a timeout response, and it implements this efficiently using a hierarchical Timing Wheel data structure that offers O(1) insertion and fast expiration checks.

BackendKafkaScala
0 likes · 12 min read
How Kafka Uses a Timing Wheel for Efficient Timeout Handling
21CTO
21CTO
Dec 1, 2020 · Big Data

How Kafka Implements Transactions: Inside the TC Service and Producer Workflow

This article provides a comprehensive walkthrough of Kafka's transaction mechanism, covering the transaction coordinator, producer initialization, partition handling, commit and abort processes, state management, high‑availability design, timeout handling, and relevant source code snippets.

Distributed SystemsKafkaProducer
0 likes · 22 min read
How Kafka Implements Transactions: Inside the TC Service and Producer Workflow
JavaEdge
JavaEdge
Dec 1, 2020 · Backend Development

How Kafka’s OffsetIndex and TimeIndex Optimize Message Retrieval

This article explains Kafka’s internal index files—OffsetIndex and TimeIndex—including their file formats, how they store relative offsets and timestamps, the space‑saving optimizations, the processes for appending, truncating, and looking up entries, and best‑practice cautions for handling these indexes.

KafkaOffsetIndexTimeIndex
0 likes · 8 min read
How Kafka’s OffsetIndex and TimeIndex Optimize Message Retrieval
JavaEdge
JavaEdge
Nov 30, 2020 · Backend Development

How Kafka’s Index Uses Binary Search and Cache‑Friendly Optimizations

This article explains Kafka's index architecture, the AbstractIndex class hierarchy, how entry sizes are chosen, the use of memory‑mapped files, the binary‑search algorithm for locating index entries, and a cache‑friendly improvement that reduces page faults and I/O latency.

Binary SearchCache OptimizationKafka
0 likes · 13 min read
How Kafka’s Index Uses Binary Search and Cache‑Friendly Optimizations
System Architect Go
System Architect Go
Nov 30, 2020 · Databases

Five Ways to Sync MySQL Data to Elasticsearch, Redis, MQ, etc.

This article outlines five practical methods for synchronizing MySQL data to external systems such as Elasticsearch, Redis, and message queues, covering business‑layer hooks, middleware integration, scheduled tasks using updated_at, binlog parsing with ROW format, and handling mixed or statement binlog formats, plus open‑source tools.

BinlogElasticsearchKafka
0 likes · 5 min read
Five Ways to Sync MySQL Data to Elasticsearch, Redis, MQ, etc.
DataFunTalk
DataFunTalk
Nov 27, 2020 · Big Data

Evolution of Kafka‑Based Data Pipeline at Chehaoduo Group: Architecture, Scaling, and Best Practices

This article chronicles the four‑year evolution of Chehaoduo Group’s Kafka ecosystem—from its initial role as a simple data‑ingestion layer to becoming the core of the company’s large‑scale data pipeline—detailing cluster management, upgrade strategies, multi‑cluster deployment, AVRO schema handling, SDK development, and operational lessons learned.

AvroCluster ManagementKafka
0 likes · 21 min read
Evolution of Kafka‑Based Data Pipeline at Chehaoduo Group: Architecture, Scaling, and Best Practices
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Nov 23, 2020 · Backend Development

Message Middleware: Benefits, Drawbacks, and Design Patterns for Concurrency, Ordering, Duplicate, and Transactional Messaging

This article explains the advantages and disadvantages of using message middleware in microservice architectures and details practical solutions for handling concurrency, ordered processing, duplicate messages, and transactional messaging using patterns like partitioning, outbox tables, CDC, and RocketMQ's two‑phase commit.

KafkaMicroservicesRocketMQ
0 likes · 12 min read
Message Middleware: Benefits, Drawbacks, and Design Patterns for Concurrency, Ordering, Duplicate, and Transactional Messaging
Tencent Cloud Developer
Tencent Cloud Developer
Nov 19, 2020 · Backend Development

Kafka Message Queue Reliability Design and Implementation

The article thoroughly explains Kafka’s message‑queue reliability design and implementation, covering use‑case scenarios, core concepts, storage format, producer acknowledgment settings, broker replication mechanisms (ISR, HW, LEO), consumer delivery semantics, the epoch solution for synchronization, and practical configuration guidelines for various consistency and availability requirements.

BrokerConsistencyConsumer
0 likes · 15 min read
Kafka Message Queue Reliability Design and Implementation
Java High-Performance Architecture
Java High-Performance Architecture
Nov 18, 2020 · Big Data

Why Pulsar Might Outperform Kafka: Key Advantages and Drawbacks

This article examines Apache Pulsar, an open‑source messaging platform created by Yahoo, compares it with Kafka by outlining Kafka’s common pain points, highlights Pulsar’s multi‑tenant architecture, layered storage, built‑in functions, and security features, and discusses the trade‑offs of each solution.

Apache PulsarBig DataDistributed Systems
0 likes · 6 min read
Why Pulsar Might Outperform Kafka: Key Advantages and Drawbacks
DataFunSummit
DataFunSummit
Nov 15, 2020 · Big Data

Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink

This article details the three‑stage evolution of 58.com’s commercial data warehouse, describing its massive scale, four‑layer architecture, technical challenges, migrations from MapReduce to Hive and Flink, real‑time streaming upgrades, and the resulting improvements in stability, accuracy, and timeliness.

Big DataData ArchitectureFlink
0 likes · 10 min read
Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink
Laravel Tech Community
Laravel Tech Community
Nov 12, 2020 · Backend Development

PHP Kafka Client Library (longlang/phpkafka) Overview

The PHP Kafka client library supports PHP‑FPM and Swoole environments, implements all 50 Kafka APIs with compression, SSL, and SASL features, requires PHP ≥ 7.1 and Kafka ≥ 1.0.0, and can be installed via Composer.

KafkaMessage Queueclient
0 likes · 2 min read
PHP Kafka Client Library (longlang/phpkafka) Overview
Tencent Cloud Middleware
Tencent Cloud Middleware
Nov 12, 2020 · Backend Development

How We Migrated Our Self‑Built Message Queue to Tencent Cloud CKafka

This article details why the e‑commerce platform built its own Corgi message queue, the operational and cost drawbacks that prompted a move to Tencent Cloud CKafka, and the three‑phase migration strategy—including dual‑write, cut‑read, and cut‑write—while preserving message safety and low latency.

CKafkaKafka
0 likes · 11 min read
How We Migrated Our Self‑Built Message Queue to Tencent Cloud CKafka
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 8, 2020 · Big Data

Flume Tuning Guide for High‑Throughput Data Ingestion

This article explains how to identify and resolve performance bottlenecks in Apache Flume by configuring Taildir sources, optimizing channel capacities, tuning Kafka sinks, adjusting JVM options, and using simple monitoring scripts, enabling a single Flume‑NG agent to sustain over 50,000 RPS in production.

Big DataConfigurationFlume
0 likes · 10 min read
Flume Tuning Guide for High‑Throughput Data Ingestion
System Architect Go
System Architect Go
Nov 7, 2020 · Operations

Request Log Analysis System: Collected Fields, Derived Data, and Metrics

This article outlines a request log analysis system that records core request fields, adds proxy‑related data, derives IP‑based ASN and geographic information, parses user‑agent details, and provides comprehensive metrics such as PV/QPS, UV, traffic, latency, status monitoring, and business‑specific insights, all visualized via an ELK‑Kafka architecture.

BackendELKKafka
0 likes · 5 min read
Request Log Analysis System: Collected Fields, Derived Data, and Metrics
Big Data Technology Architecture
Big Data Technology Architecture
Nov 1, 2020 · Big Data

Practical Application of Flink + Kafka in NetEase Cloud Music Real‑Time Computing Platform

This article presents NetEase Cloud Music's real‑time computing platform built on Flink and Kafka, covering background, architectural design, Kafka and Flink selection reasons, platformization, warehouse usage, encountered challenges, and the solutions implemented to improve reliability and performance.

FlinkKafkaReal-time Streaming
0 likes · 11 min read
Practical Application of Flink + Kafka in NetEase Cloud Music Real‑Time Computing Platform
21CTO
21CTO
Oct 30, 2020 · Big Data

Which Log Collection System Wins? Scribe, Chukwa, Kafka, Flume & ELK Compared

This article reviews the background, requirements, and architectural designs of major open‑source log collection systems—including Facebook’s Scribe, Apache’s Chukwa, LinkedIn’s Kafka, Cloudera’s Flume—and evaluates mature monitoring tools such as ELK, highlighting their features, use cases, advantages, and drawbacks for large‑scale log processing.

Big DataELKFlume
0 likes · 18 min read
Which Log Collection System Wins? Scribe, Chukwa, Kafka, Flume & ELK Compared
Programmer DD
Programmer DD
Oct 29, 2020 · Backend Development

Master Kafka Interview Questions: Architecture, Configurations, and Best Practices

This article provides a comprehensive overview of Kafka as a distributed messaging middleware, covering its core concepts, architecture, producer and consumer mechanics, common interview questions, configuration options, high‑availability guarantees, and performance optimizations for backend developers.

ConsumerDistributed MessagingKafka
0 likes · 20 min read
Master Kafka Interview Questions: Architecture, Configurations, and Best Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 29, 2020 · Fundamentals

Zero-Copy Data Transfer Mechanism: Principles, Implementations, and Applications in Java, Kafka, and Spark

This article explains the zero‑copy data transfer technique, compares it with traditional read/write approaches, shows Java NIO code examples, and discusses its use in high‑performance systems such as Kafka and Spark, highlighting the reductions in context switches and memory copies.

Data TransferJava NIOKafka
0 likes · 16 min read
Zero-Copy Data Transfer Mechanism: Principles, Implementations, and Applications in Java, Kafka, and Spark
Efficient Ops
Efficient Ops
Oct 26, 2020 · Operations

Secure Production ELK Stack with Kafka: Step‑by‑Step Deployment Guide

This guide walks through building a secure, production‑grade logging pipeline by deploying an ELK stack (Elasticsearch, Logstash, Kibana) with X‑Pack security, a Kafka message queue with SASL authentication, and Filebeat agents, covering environment preparation, certificate generation, configuration files, and startup scripts.

DeploymentELKKafka
0 likes · 31 min read
Secure Production ELK Stack with Kafka: Step‑by‑Step Deployment Guide
Architecture Digest
Architecture Digest
Oct 22, 2020 · Backend Development

Kafka Timing Wheel: Design, Operation, and Code Walkthrough

The article explains how Kafka handles timeout‑based requests using a Timing Wheel data structure, detailing its design, parameters, operation principles, overflow handling, and providing Scala code examples that illustrate O(1) task insertion compared to traditional O(logN) delay queues.

Data StructuresKafkaScala
0 likes · 10 min read
Kafka Timing Wheel: Design, Operation, and Code Walkthrough
dbaplus Community
dbaplus Community
Oct 13, 2020 · Big Data

How to Build a Real‑Time Data Warehouse with Flink: Principles, Architecture, and Best Practices

This article explains why real‑time data warehouses are needed, outlines their core principles, compares them with offline warehouses, describes typical use cases such as real‑time OLAP, dashboards, feature generation and monitoring, and provides a step‑by‑step guide to designing, implementing, and operating a Flink‑based streaming warehouse with Kafka, HBase, and metadata management.

FlinkKafkaOLAP
0 likes · 29 min read
How to Build a Real‑Time Data Warehouse with Flink: Principles, Architecture, and Best Practices
Top Architect
Top Architect
Oct 9, 2020 · Backend Development

Implementing Delayed Queues with Redis and Other Technologies

This article explains how Redis can be used to implement delayed queues, compares its advantages with other solutions such as RabbitMQ, RocketMQ, Kafka, Netty and Java DelayQueue, and provides practical guidance on using sorted sets and timestamps for time‑based task scheduling.

KafkaMessage QueueRabbitMQ
0 likes · 8 min read
Implementing Delayed Queues with Redis and Other Technologies
MaGe Linux Operations
MaGe Linux Operations
Sep 29, 2020 · Backend Development

Understanding Message Middleware: Core Architecture and Kafka Basics

This article explains the fundamental architecture of message middleware, its key roles such as peak shaving, asynchronous processing and decoupling, the two consumption models (publish‑subscribe and point‑to‑point), and introduces core Kafka concepts with practical Java code examples.

Distributed SystemsKafkaMessage Queue
0 likes · 7 min read
Understanding Message Middleware: Core Architecture and Kafka Basics
Java Architect Essentials
Java Architect Essentials
Sep 21, 2020 · Backend Development

Design and Implementation of a Scalable Long‑Connection Gateway

This article details the architecture, protocol design, permission control, reliability mechanisms, and scaling strategies of a long‑connection gateway built with OpenResty, Kafka, and Redis, illustrating how to share persistent connections across multiple business services while ensuring high performance and fault tolerance.

BackendKafkaMessaging
0 likes · 13 min read
Design and Implementation of a Scalable Long‑Connection Gateway
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 18, 2020 · Big Data

Understanding Kafka Consumer Groups, Partition Assignment, and Offset Management

This article explains how Kafka consumer groups accelerate message consumption by distributing partitions across multiple consumers, details the three key characteristics of consumer groups, and provides in‑depth guidance on partition assignment strategies and offset management with practical Java code examples.

Big DataKafkaOffset Management
0 likes · 13 min read
Understanding Kafka Consumer Groups, Partition Assignment, and Offset Management
Architect
Architect
Sep 17, 2020 · Big Data

Kafka Exactly-Once Semantics and Transaction API Overview

This article explains Kafka's exactly‑once semantics and transaction support, detailing the new producer API methods, related exceptions, configuration parameters, and a sample application illustrating how to initialize, begin, process, and commit or abort transactions while ensuring idempotent and atomic message handling.

ConfigurationExactly-OnceIdempotence
0 likes · 19 min read
Kafka Exactly-Once Semantics and Transaction API Overview
IT Architects Alliance
IT Architects Alliance
Sep 15, 2020 · Backend Development

Step‑by‑Step Guide to Deploying a Multi‑Node Kafka Cluster

This tutorial walks through setting up a four‑node Kafka cluster—including Zookeeper installation, broker configuration, service startup, replication settings, fault handling, and leader election—using Linux commands and detailed code snippets to help readers build a production‑ready streaming platform.

BackendCluster DeploymentKafka
0 likes · 14 min read
Step‑by‑Step Guide to Deploying a Multi‑Node Kafka Cluster
JavaEdge
JavaEdge
Sep 15, 2020 · Backend Development

How Kafka Uses ZooKeeper for Metadata Management and Client Coordination

This article explains how Kafka relies on ZooKeeper to store cluster metadata, detailing the ZK node hierarchy, the process by which clients locate brokers, the broker‑side handling of metadata requests, and recommended practices for large‑scale deployments.

Kafkabackend-developmentmetadata
0 likes · 8 min read
How Kafka Uses ZooKeeper for Metadata Management and Client Coordination
DataFunTalk
DataFunTalk
Sep 13, 2020 · Big Data

Online Sample Generation with Flink: Architecture and Implementation

This article explains why Flink is chosen for online sample generation, describes the end‑to‑end implementation steps—including stream union, state‑timer processing, and output formatting—covers state backend choices, monitoring, validation, fault handling, and platformization for scalable real‑time machine‑learning pipelines.

FlinkKafkaOnline Sample Generation
0 likes · 11 min read
Online Sample Generation with Flink: Architecture and Implementation
dbaplus Community
dbaplus Community
Sep 1, 2020 · Big Data

Mastering Real‑Time MySQL Binlog Sync with Debezium, Kafka & Hive

This article presents a systematic guide to real‑time MySQL binlog ingestion, outlining three core principles—decoupling from business data, handling schema changes, and ensuring traceability—followed by concrete Debezium‑Kafka‑Hive solutions, scenario‑specific tactics, and practical tips for reliable data pipelines.

DebeziumKafkadata ingestion
0 likes · 15 min read
Mastering Real‑Time MySQL Binlog Sync with Debezium, Kafka & Hive
DataFunTalk
DataFunTalk
Sep 1, 2020 · Big Data

NetEase Real-Time Computing Platform (Sloth): Architecture, Practices, and Future Outlook

This article introduces NetEase's real-time computing platform Sloth, detailing its architecture, component layers, integrated IDE, operational tooling, unified metadata management, challenges such as Kudu write amplification, and proposes a tiered real‑time data‑warehouse model with a vision for storage‑compute separation and unified batch‑stream APIs.

Big DataFlinkKafka
0 likes · 13 min read
NetEase Real-Time Computing Platform (Sloth): Architecture, Practices, and Future Outlook
Youzan Coder
Youzan Coder
Aug 26, 2020 · Mobile Development

How We Built a Real‑Time Crash Feedback Platform for Mobile Apps

This article details the design and implementation of a comprehensive crash feedback platform for mobile applications, covering the motivation behind replacing third‑party services, the system architecture using Flink, Kafka and HBase, crash interception on Android, automated grouping and assignment, version filtering, daily reporting, and future enhancements.

AndroidFlinkKafka
0 likes · 15 min read
How We Built a Real‑Time Crash Feedback Platform for Mobile Apps
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Aug 26, 2020 · Backend Development

Interview Experience and Technical Q&A for a Java Backend Position at Tencent Cloud (Xi'an)

The article shares the author's recent move to Xi'an, discusses the local job market, and provides detailed interview questions and answers on Java backend topics such as Redis replication, Kafka performance, MySQL transactions, and JVM garbage collection to help job seekers prepare effectively.

JVMKafkabackend-development
0 likes · 8 min read
Interview Experience and Technical Q&A for a Java Backend Position at Tencent Cloud (Xi'an)
Java Architect Essentials
Java Architect Essentials
Aug 25, 2020 · Backend Development

Understanding Kafka: Core Concepts, Architecture, and Performance Secrets

This article explains Kafka's role as a message system, details its fundamental components such as topics, partitions, producers, consumers, and replicas, describes how Zookeeper coordinates the cluster, and explores performance optimizations like sequential writes, zero‑copy, and network design.

Distributed SystemsKafkaMessage Queue
0 likes · 12 min read
Understanding Kafka: Core Concepts, Architecture, and Performance Secrets
DataFunTalk
DataFunTalk
Aug 25, 2020 · Databases

Real‑time Data Ingestion and Optimization with ClickHouse at ByteDance

This article details ByteDance's engineering practices for using ClickHouse to ingest, store, and query massive real‑time recommendation and advertising data, covering early external‑transaction mechanisms, the risks of direct INSERTs, the design and evaluation of Kafka Engine versus Flink pipelines, and a series of performance and reliability improvements implemented to support high‑frequency workloads.

Database OptimizationKafkaReal-time analytics
0 likes · 20 min read
Real‑time Data Ingestion and Optimization with ClickHouse at ByteDance
Didi Tech
Didi Tech
Aug 24, 2020 · Big Data

Evolution and Architecture of DiDi Data Channel Service

DiDi’s Data Channel Service evolved from a fragmented component system into a unified, SLA‑driven platform with a UI‑based Sync Center and Flink‑powered StreamSQL engine, dramatically improving task creation speed, resource utilization, and reliability while automating issue diagnosis for company‑wide real‑time and offline data synchronization.

Big DataETLFlink
0 likes · 12 min read
Evolution and Architecture of DiDi Data Channel Service
Java Architect Essentials
Java Architect Essentials
Aug 21, 2020 · Big Data

Design and Integration of Flume, Kafka, Storm, Drools, and Redis for Real‑Time ETL Log Analysis

This article presents a modular architecture for real‑time ETL log analysis that combines Flume for log collection, Kafka as a buffering layer, Storm for stream processing, Drools for rule‑based data transformation, and Redis for fast storage, detailing installation, configuration, and code integration steps.

Big DataDroolsFlume
0 likes · 23 min read
Design and Integration of Flume, Kafka, Storm, Drools, and Redis for Real‑Time ETL Log Analysis
Architect
Architect
Aug 21, 2020 · Backend Development

Message Queue Interview Guide: Benefits, Drawbacks, Choosing the Right MQ, and Ensuring High Availability

This article explains why and when to use message queues, outlines their advantages and disadvantages, compares popular MQ products such as Kafka, RabbitMQ, RocketMQ and ActiveMQ, and provides practical advice on high‑availability, duplicate‑consumption prevention, and idempotent design for interview preparation.

KafkaMQRabbitMQ
0 likes · 19 min read
Message Queue Interview Guide: Benefits, Drawbacks, Choosing the Right MQ, and Ensuring High Availability
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 15, 2020 · Big Data

Step-by-Step Guide to Building an ELK Stack with Kafka, Zookeeper, Logstash, and Filebeat for Log Collection

This tutorial provides a comprehensive, step-by-step procedure for setting up a log‑collection pipeline using Filebeat, Kafka, Zookeeper, Logstash, Elasticsearch, and Kibana across multiple servers, covering hardware preparation, system tuning, software installation, configuration files, and verification commands.

Big DataELKFilebeat
0 likes · 11 min read
Step-by-Step Guide to Building an ELK Stack with Kafka, Zookeeper, Logstash, and Filebeat for Log Collection
Top Architect
Top Architect
Aug 14, 2020 · Big Data

Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions

This article presents a comprehensive guide for transferring massive MySQL datasets to HBase, covering environment setup on Ubuntu, three synchronization methods—MySQL LOAD DATA, a Kafka‑Thrift pipeline using Maxwell, and real‑time Flink processing—along with performance comparisons and practical tips for Hadoop, HBase, Kafka, Zookeeper, Phoenix, and related tools.

DataSyncFlinkHBase
0 likes · 24 min read
Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions
Tencent Cloud Middleware
Tencent Cloud Middleware
Aug 12, 2020 · Big Data

How Serverless Functions Can Replace Traditional Kafka Data Pipelines for Lower Cost and Easier Scaling

This article explains how Tencent Cloud CKafka works, describes the challenges of traditional open‑source data‑flow solutions, and demonstrates a Serverless Function approach—complete with architecture diagrams and code examples—to achieve low‑cost, auto‑scaling Kafka‑to‑Elasticsearch pipelines.

Big DataCKafkaElasticsearch
0 likes · 12 min read
How Serverless Functions Can Replace Traditional Kafka Data Pipelines for Lower Cost and Easier Scaling
IT Architects Alliance
IT Architects Alliance
Aug 12, 2020 · Big Data

Introduction to Confluent KSQL for Real-Time Stream Processing

This article introduces Confluent KSQL, a SQL‑based real‑time stream processing engine for Kafka, covering its architecture, stream vs table concepts, query lifecycle, Docker‑based setup, DDL commands, example joins, windowed aggregations, connectors, and its advantages and limitations.

Big DataDockerKSQL
0 likes · 9 min read
Introduction to Confluent KSQL for Real-Time Stream Processing
Top Architect
Top Architect
Aug 11, 2020 · Big Data

Kafka Basics and Cluster Architecture Overview

This article provides a comprehensive introduction to Kafka, covering its role as a messaging system, core concepts such as topics, partitions, producers, consumers, and messages, and then delves into the cluster architecture including replicas, consumer groups, controller coordination with Zookeeper, performance optimizations, log segmentation, and network design.

Cluster ArchitectureKafkaMessage Queue
0 likes · 11 min read
Kafka Basics and Cluster Architecture Overview
New Oriental Technology
New Oriental Technology
Aug 11, 2020 · Backend Development

Engineering Case Study of New Oriental Cloud Classroom Backend Architecture and Scaling During the Pandemic

The article details how New Oriental's Cloud Classroom backend, built with Java, Spring, MySQL, Redis, Kafka, Sentinel, and other modern technologies, scaled to support millions of users and a hundred‑fold surge in demand during the pandemic through architectural optimizations, distributed caching, traffic control, and rapid performance improvements.

Distributed SystemsKafkajava
0 likes · 7 min read
Engineering Case Study of New Oriental Cloud Classroom Backend Architecture and Scaling During the Pandemic
Open Source Linux
Open Source Linux
Aug 11, 2020 · Backend Development

Build a Docker‑Based Kafka Cluster and Integrate It with Spring Boot

This guide walks you through creating a three‑node Kafka cluster with Zookeeper using Docker‑Compose, configuring the necessary YAML, launching the containers, and then integrating the cluster into a Spring Boot application by adding dependencies, setting Kafka properties, defining message, sender, and receiver classes, and testing the message flow.

DockerDocker ComposeKafka
0 likes · 6 min read
Build a Docker‑Based Kafka Cluster and Integrate It with Spring Boot
Efficient Ops
Efficient Ops
Aug 4, 2020 · Operations

Mastering Filebeat: How to Collect and Ship Container Logs to Kafka

This article introduces Filebeat as a lightweight log shipper, explains its core components and processing flow, and provides step‑by‑step configuration examples for gathering container logs and forwarding them to Kafka or Elasticsearch in cloud‑native environments.

ElasticsearchFilebeatGo
0 likes · 13 min read
Mastering Filebeat: How to Collect and Ship Container Logs to Kafka
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 4, 2020 · Big Data

Manual Kafka Offset Management in Spark Streaming using createDirectStream (Java & Scala)

This article explains how to use Spark Streaming's Direct Approach with Kafka, manually manage offsets, and provides complete Java and Scala implementations—including a JavaKafkaManager class, a demo application, and a Scala KafkaManager—illustrating the creation of DirectKafkaInputDStream, offset handling, and integration with Spark.

KafkaOffset ManagementScala
0 likes · 14 min read
Manual Kafka Offset Management in Spark Streaming using createDirectStream (Java & Scala)