Tagged articles
1273 articles
Page 12 of 13
DataFunTalk
DataFunTalk
Mar 7, 2019 · Big Data

Design and Evolution of Didi's Real‑Time Data Computing Platform

The article details how Didi built and iterated its real‑time data platform, describing the shift from MySQL‑based batch processing to a Kafka‑Samza‑Druid architecture with Spark Streaming and Flink, the challenges addressed, and the current capabilities and operational metrics.

Big DataDruidFlink
0 likes · 9 min read
Design and Evolution of Didi's Real‑Time Data Computing Platform
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jan 29, 2019 · Operations

How to Optimize Large-Scale Log Systems for Real-Time Monitoring and Scalability

This article examines the design, deployment, and optimization of massive log systems, comparing architectures, discussing real‑time versus near‑real‑time requirements, and presenting practical improvements such as memory, CPU, network tuning, data partitioning, storage reduction, and component upgrades using ELK, Kafka, Fluentd, and HBase.

Big DataELKFluentd
0 likes · 18 min read
How to Optimize Large-Scale Log Systems for Real-Time Monitoring and Scalability
dbaplus Community
dbaplus Community
Dec 12, 2018 · Backend Development

How to Choose the Right Message Queue: RabbitMQ vs Kafka

This article examines the role of message‑queue middleware in high‑concurrency IM systems, compares popular open‑source options such as ActiveMQ, RabbitMQ, Kafka, RocketMQ and ZeroMQ, and provides a detailed multi‑dimensional framework—including functionality, performance, reliability, operational management, and ecosystem factors—to help engineers select the most suitable queue for their specific business needs.

KafkaMessage QueueMiddleware Selection
0 likes · 28 min read
How to Choose the Right Message Queue: RabbitMQ vs Kafka
Manbang Technology Team
Manbang Technology Team
Dec 12, 2018 · Big Data

Kafka Overview: Core Concepts, Architecture, Configuration, and Usage in Real-Time Computing

This article provides a comprehensive technical overview of Kafka, covering its core concepts, producer and consumer models, architecture, configuration parameters, replication mechanisms, performance optimizations, operational monitoring, tooling scripts, and related product implementations for real-time data processing.

Big DataKafkaMessage Queue
0 likes · 18 min read
Kafka Overview: Core Concepts, Architecture, Configuration, and Usage in Real-Time Computing
ITPUB
ITPUB
Dec 10, 2018 · Big Data

How Meituan Syncs MySQL to Hive in Real-Time Using Binlog, Canal, and Camus

This article explains Meituan's architecture for accurately and efficiently moving MySQL data into a Hive data warehouse by capturing binlog streams with Canal, transporting them via Kafka, and restoring them offline with Camus and a merge process that handles inserts, updates, and deletes.

BinlogKafkahive
0 likes · 14 min read
How Meituan Syncs MySQL to Hive in Real-Time Using Binlog, Canal, and Camus
UCloud Tech
UCloud Tech
Nov 29, 2018 · Operations

How UCloud’s Physical Network Orchestrator Cuts IDC Build Time from Days to Hours

UCloud’s physical network orchestrator automates large‑scale data‑center switch configuration, reducing IDC network build cycles from 2‑3 days to 2‑3 hours, boosting success rates to 99%, while handling 3000+ switches, 200 Gbps access throughput, and supporting hybrid‑cloud real‑time connectivity through a scenario‑driven, Kafka‑backed architecture.

Configuration ManagementKafkaUCloud
0 likes · 14 min read
How UCloud’s Physical Network Orchestrator Cuts IDC Build Time from Days to Hours
Programmer DD
Programmer DD
Nov 27, 2018 · Backend Development

How to Prevent Duplicate Message Consumption with Spring Cloud Stream Consumer Groups

This article explains why duplicate message consumption occurs when using Spring Cloud Stream with RabbitMQ or Kafka, introduces the concept of consumer groups, and provides a step‑by‑step Java example showing how to configure and use consumer groups to ensure each message is processed by only one instance.

Kafkaconsumer-groupspring-boot
0 likes · 6 min read
How to Prevent Duplicate Message Consumption with Spring Cloud Stream Consumer Groups
dbaplus Community
dbaplus Community
Nov 20, 2018 · Backend Development

20 Proven Kafka Best Practices for High‑Throughput Clusters

This guide presents New Relic’s 20 practical best‑practice recommendations—covering partitions, consumers, producers, and brokers—to help engineers design, tune, and monitor Apache Kafka deployments for reliable, high‑throughput performance.

BrokersConsumersHigh Throughput
0 likes · 15 min read
20 Proven Kafka Best Practices for High‑Throughput Clusters
21CTO
21CTO
Nov 20, 2018 · Big Data

What Languages and Tools Do Big Data Experts Use? Insights from 31 IT Leaders

Based on interviews with 31 IT leaders from 28 organizations, this article reveals the most popular programming languages, frameworks, and platforms—such as Python, Scala, Spark, Kafka, TensorFlow, and Tableau—currently driving big‑data extraction, analysis, and reporting, and highlights emerging trends and tool preferences.

Big DataKafkaPython
0 likes · 12 min read
What Languages and Tools Do Big Data Experts Use? Insights from 31 IT Leaders
Ctrip Technology
Ctrip Technology
Oct 17, 2018 · Big Data

Design and Evolution of Ctrip Flight Ticket Log Tracking System

This article describes how Ctrip's flight ticket team built a massive log‑tracking platform using Elasticsearch, Kafka, and Spark, evaluated storage options such as Cassandra and HBase, introduced secondary indexing and hot‑cold data separation, and continuously evolved the architecture to balance resource usage and query performance.

KafkaLog Analyticsarchitecture
0 likes · 7 min read
Design and Evolution of Ctrip Flight Ticket Log Tracking System
DataFunTalk
DataFunTalk
Oct 14, 2018 · Big Data

Exploring Real-Time Data Warehouse Practices Based on HBase

The article details the evolution from an offline to a real‑time HBase data warehouse, covering business scenarios, the use of Maxwell for MySQL‑to‑Kafka ingestion, Phoenix for SQL access, CDH cluster tuning, monitoring, and several production case studies.

HBaseKafkaPhoenix
0 likes · 14 min read
Exploring Real-Time Data Warehouse Practices Based on HBase
Efficient Ops
Efficient Ops
Oct 13, 2018 · Big Data

Boost Your Kafka Integration with KafkaBridge: Multi-Language SDK Overview

KafkaBridge is a lightweight, multi-language SDK that simplifies Kafka read/write operations, offering unified interfaces, long‑connection reuse for PHP‑FPM, and reliable message delivery, with detailed compilation steps, usage examples, and performance benchmarks across C++, Python, PHP, and Go.

GolangKafkaPHP
0 likes · 7 min read
Boost Your Kafka Integration with KafkaBridge: Multi-Language SDK Overview
Architecture Talk
Architecture Talk
Sep 30, 2018 · Backend Development

Why Event‑Driven Architecture Beats Command‑Driven Design in Microservices

This article explains how shifting from synchronous command‑driven interactions to asynchronous event‑driven flows reduces coupling, improves scalability, and enables flexible querying in distributed systems, while also discussing hybrid patterns, the single‑writer principle, and practical advantages illustrated with Kafka‑based examples.

Event-Driven ArchitectureEventsKafka
0 likes · 13 min read
Why Event‑Driven Architecture Beats Command‑Driven Design in Microservices
21CTO
21CTO
Sep 14, 2018 · Backend Development

How Message Queues Enable Near Real‑Time Incremental Indexing in Search Engines

This article examines the high‑real‑time requirements of incremental data ingestion for search engines, compares three update schemes, and details how adopting a Kafka subscription‑based message‑queue approach dramatically improves latency and flexibility for the Nuomi search framework.

KafkaMessage Queueincremental indexing
0 likes · 8 min read
How Message Queues Enable Near Real‑Time Incremental Indexing in Search Engines
JD Tech
JD Tech
Sep 4, 2018 · Backend Development

Design and Evolution of an Order Dispatch System for Instant Delivery Platforms

This article describes the evolution, architectural design, and key implementation details of an order dispatch system for instant‑delivery services, covering problem analysis, delay‑task mechanisms such as database polling, DelayQueue and TimingWheel, and the final solution that combines Redis with a timing‑wheel scheduler and asynchronous processing.

Kafkadelay queueinstant delivery
0 likes · 11 min read
Design and Evolution of an Order Dispatch System for Instant Delivery Platforms
dbaplus Community
dbaplus Community
Aug 8, 2018 · Big Data

How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns

This article explains the architecture of a Real‑Time Data Platform (RTDP), details the technical selection of core components such as DBus, Kafka, Wormhole, Moonbox and Davinci, and discusses data management, security, operations, and four deployment modes—synchronization, flow, rotation and intelligent—illustrating how each fits different business scenarios.

Big Data ArchitectureData IntegrationKafka
0 likes · 24 min read
How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns
Architecture Digest
Architecture Digest
Aug 7, 2018 · Big Data

Apache Kafka Overview, Architecture, and Sample Producer/Consumer Code

This article provides a comprehensive overview of Apache Kafka, comparing it with ActiveMQ, explaining its distributed architecture, topics, partitions, consumption models, high‑availability mechanisms, exactly‑once semantics, and includes detailed Java producer and consumer code examples for practical implementation.

Big DataConsumerDistributed Messaging
0 likes · 22 min read
Apache Kafka Overview, Architecture, and Sample Producer/Consumer Code
Meituan Technology Team
Meituan Technology Team
Jul 5, 2018 · Big Data

Meituan Dianping User Action System (UAS): Architecture and Implementation for Real-time User Behavior Processing

Meituan‑Dianping’s User Action System unifies disparate user‑behavior events with a 5W1H format, ingests them via a proprietary MAPI channel into Kafka, processes them in real‑time using Storm and a Lambda batch‑speed architecture, and delivers millisecond‑level responses for billions of daily events while offering flexible, modular query and storage options.

KafkaLambda architectureStorm
0 likes · 17 min read
Meituan Dianping User Action System (UAS): Architecture and Implementation for Real-time User Behavior Processing
Architecture Digest
Architecture Digest
Jun 18, 2018 · Operations

Design and Optimization of Large‑Scale Log Systems

This article examines the challenges of handling massive log data in high‑traffic e‑commerce platforms and presents a comprehensive architecture, optimization strategies, and practical implementations—including Rsyslog, Kafka, Fluentd, and the ELK stack—to improve scalability, performance, and reliability of log management systems.

Big DataELKFluentd
0 likes · 17 min read
Design and Optimization of Large‑Scale Log Systems
Programmer DD
Programmer DD
Jun 3, 2018 · Backend Development

Designing a China‑Style Microservice Stack 2.0: Practical Component Guide

This article presents a practical, China‑focused microservice reference stack built on Spring Cloud, detailing core support components such as Zuul, Eureka, Apollo, and Spring Boot, as well as monitoring tools like Kafka, ELK, CAT, KairosDB, ZMon, and Hystrix, and explains when and how to apply each in production environments.

ApolloBackend ArchitectureKafka
0 likes · 20 min read
Designing a China‑Style Microservice Stack 2.0: Practical Component Guide
Java Captain
Java Captain
May 24, 2018 · Big Data

Debugging a Kafka Data Drop: A Step‑by‑Step Troubleshooting Case Study

After a recent feature release caused a sharp decline in a key data metric, the team followed a systematic, fourteen‑step troubleshooting process—including verification, code review, DBA involvement, local debugging, environment comparison, logging, packet capture, service restarts, request mode changes, load testing, and partition resizing—to identify and resolve a Kafka‑related throughput bottleneck.

KafkaLoad TestingPerformance debugging
0 likes · 8 min read
Debugging a Kafka Data Drop: A Step‑by‑Step Troubleshooting Case Study
Architecture Digest
Architecture Digest
May 14, 2018 · Backend Development

Implementing and Optimizing a High‑Concurrency Flash Sale System with Optimistic Lock, Distributed Rate Limiting, Redis Cache, and Kafka

This article walks through building a Java‑based flash‑sale (秒杀) service, diagnosing overselling issues, and progressively enhancing it with optimistic locking, distributed rate limiting, Redis caching, and asynchronous Kafka processing to achieve higher throughput and data consistency under heavy concurrency.

KafkaPerformance Testingdistributed rate limiting
0 likes · 14 min read
Implementing and Optimizing a High‑Concurrency Flash Sale System with Optimistic Lock, Distributed Rate Limiting, Redis Cache, and Kafka
Tencent Cloud Developer
Tencent Cloud Developer
May 3, 2018 · Operations

Tencent Cloud Kafka Automated Operations Practices

Tencent Cloud’s senior engineer Yang Yuan explains how their managed Kafka service tackles version diversity, resource allocation, dynamic scaling, broker addition/removal, and partition migration using versioned clusters, bin‑packing algorithms, penalty weighting, and predictive scheduling to sustain trillions of messages and billions of messages per minute.

KafkaOperations AutomationResource Management
0 likes · 14 min read
Tencent Cloud Kafka Automated Operations Practices
21CTO
21CTO
Apr 28, 2018 · Big Data

Why Kafka Dominates Real-Time Data Streaming in the Big Data Era

This article explains why Kafka has become essential for real‑time data streaming in the big‑data era, detailing its performance advantages, core use cases, major adopters, multilingual support, and how its scalable storage and retention mechanisms empower modern data pipelines.

KafkaReal-time Streaming
0 likes · 10 min read
Why Kafka Dominates Real-Time Data Streaming in the Big Data Era
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Apr 17, 2018 · Big Data

How a Big Data Platform Powers Real‑Time Facial Recognition for Billion‑Scale Face Libraries

This case study details how Beijing 恒远华信息技术有限公司 built a dynamic face‑capture and real‑time recognition solution on Huawei FusionInsight HD, leveraging deep‑learning algorithms, distributed storage, and stream processing to handle hundreds of millions of faces with high speed, efficiency, and security.

Apache StormHBaseHuawei FusionInsight
0 likes · 17 min read
How a Big Data Platform Powers Real‑Time Facial Recognition for Billion‑Scale Face Libraries
Didi Tech
Didi Tech
Apr 11, 2018 · Backend Development

How to Turn Synchronous RPC into Asynchronous Queues for Reliable Microservices

The article examines the reliability challenges of microservice architectures that rely heavily on synchronous RPC calls, and proposes a comprehensive solution that converts failing RPCs to asynchronous message‑queue workflows, introduces a write‑ahead‑queue for transactional consistency between databases and queues, and outlines offset management to ensure end‑to‑end fault tolerance.

KafkaMessage QueueMicroservices
0 likes · 12 min read
How to Turn Synchronous RPC into Asynchronous Queues for Reliable Microservices
Snowball Engineer Team
Snowball Engineer Team
Mar 23, 2018 · Big Data

Redesigning Snowball's Log Collection Architecture During Hadoop Cluster Expansion

The article details Snowball's challenges with a saturated CDH Hadoop cluster, outlines the limitations of the original Kafka‑based log pipeline, and explains how a comprehensive redesign using FlumeNG, Spillable Memory Channels, and custom HDFS sinks resolves latency, data loss, and high‑load issues while supporting future growth.

Cluster MigrationFlumeNGHadoop
0 likes · 6 min read
Redesigning Snowball's Log Collection Architecture During Hadoop Cluster Expansion
Programmer DD
Programmer DD
Mar 12, 2018 · Backend Development

How to Choose the Right Message Queue: Practical Insights Beyond the Hype

This article shares a seasoned developer’s perspective on selecting a message‑queue middleware, outlining typical adoption stages, three key evaluation criteria—coder expertise, current and future requirements, and community/ecosystem health—and offering candid advice on avoiding common pitfalls.

Backend ArchitectureKafkaMQ selection
0 likes · 9 min read
How to Choose the Right Message Queue: Practical Insights Beyond the Hype
Beike Product & Technology
Beike Product & Technology
Mar 9, 2018 · Big Data

How Lianjia Built a Low‑Latency Real‑Time Data Platform with Spark Streaming

This article details Lianjia's journey of designing and implementing a low‑latency, stable real‑time computing platform using Spark Streaming on YARN, covering technical selection, architecture components, version compatibility challenges, exactly‑once semantics, graceful shutdown, Kafka tuning, and future enhancements.

Big DataExactly-OnceKafka
0 likes · 11 min read
How Lianjia Built a Low‑Latency Real‑Time Data Platform with Spark Streaming
iQIYI Technical Product Team
iQIYI Technical Product Team
Jan 31, 2018 · Big Data

Evolution of iQIYI Real-Time Big Data Collection System

iQIYI’s big‑data collection system has progressed from simple HTTP log uploads to a Flume‑Kafka pipeline and finally to a custom Venus‑Agent architecture with centralized configuration, persistent offsets, dual‑Kafka streams and Flink processing, now handling tens of millions of queries per second and over three hundred billion records daily to power its AI‑driven services.

Big DataFlinkFlume
0 likes · 15 min read
Evolution of iQIYI Real-Time Big Data Collection System
Hujiang Technology
Hujiang Technology
Jan 29, 2018 · Operations

Design and Implementation of a Low‑Impact Distributed Tracing System for Service Calls

This article describes the background, design goals, architecture, implementation details, and lessons learned from building a low‑overhead, low‑intrusion distributed tracing system using Kafka, Elasticsearch, and OpenTracing to monitor microservice interactions and support performance analysis and DevOps decision‑making.

Distributed TracingElasticsearchKafka
0 likes · 9 min read
Design and Implementation of a Low‑Impact Distributed Tracing System for Service Calls
dbaplus Community
dbaplus Community
Jan 16, 2018 · Big Data

Kafka MirrorMaker Mastery: Real‑Time Sync, Tuning & Troubleshooting

Kafka MirrorMaker provides near‑real‑time cross‑data‑center replication by consuming from a source cluster and producing to a target cluster, and this guide explains its core features, new vs. old consumer APIs, partition assignment strategies, performance tuning, network considerations, and practical command‑line examples.

Consumer APIKafkaMirrorMaker
0 likes · 13 min read
Kafka MirrorMaker Mastery: Real‑Time Sync, Tuning & Troubleshooting
Meituan Technology Team
Meituan Technology Team
Jan 12, 2018 · Backend Development

Design and Implementation of Meituan Hotel Full-Chain Log and Trace System

To cope with Meituan Hotel’s exploding micro‑service complexity, the infrastructure team built the Satellite System—combining MTrace and a selective, zero‑intrusion Log4j2‑based logging pipeline that streams enriched logs through Kafka, Storm, Redis and Elasticsearch, delivering second‑level trace‑log queries and six‑month retention, dramatically speeding up debugging.

Distributed TracingElasticsearchKafka
0 likes · 11 min read
Design and Implementation of Meituan Hotel Full-Chain Log and Trace System
MaGe Linux Operations
MaGe Linux Operations
Dec 11, 2017 · Big Data

Master Kafka Basics: Architecture, Core Concepts, and Hands‑On Python Experiments

This article explains Kafka’s core concepts—including producers, consumers, topics, partitions, brokers, and consumer groups—describes its distributed architecture with leader‑follower replication, and provides three hands‑on kafka‑python experiments that demonstrate basic messaging, fault‑tolerant consumer groups, and offset management for reliable consumption.

Distributed StreamingKafkaOffset Management
0 likes · 9 min read
Master Kafka Basics: Architecture, Core Concepts, and Hands‑On Python Experiments
21CTO
21CTO
Nov 11, 2017 · Big Data

How We Built a Scalable Seller Log System with Kafka, Storm, ES & HBase

This article explains the design and implementation of a unified seller‑operation logging platform that uses Kafka for ingestion, Storm for real‑time processing, Elasticsearch for hot‑data search, and HBase for cold‑data storage, detailing the challenges faced and the optimizations applied.

Big DataElasticsearchHBase
0 likes · 12 min read
How We Built a Scalable Seller Log System with Kafka, Storm, ES & HBase
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Oct 26, 2017 · Backend Development

Understanding Kafka’s NIO Selector: How the Selector Class Manages Connections

This article delves into Kafka’s network layer implementation, explaining the Selector class’s role in registering socket channels, handling connection events, and orchestrating reads and writes via KafkaChannel and TransportLayer, while illustrating packet structures and providing code snippets for key functions like register, connect, poll, and send.

KafkaNetwork I/Obackend-development
0 likes · 7 min read
Understanding Kafka’s NIO Selector: How the Selector Class Manages Connections
dbaplus Community
dbaplus Community
Oct 15, 2017 · Big Data

How JD Built a Scalable Seller Log Platform with Kafka, Storm, ES & HBase

This article details JD's end‑to‑end seller log system architecture, explaining why Kafka, Storm, Elasticsearch and HBase were chosen, the challenges faced during scaling, and the practical solutions implemented to achieve a unified, high‑throughput logging platform for merchants and operations.

Big DataElasticsearchHBase
0 likes · 13 min read
How JD Built a Scalable Seller Log Platform with Kafka, Storm, ES & HBase
Dada Group Technology
Dada Group Technology
Sep 29, 2017 · Operations

Overwatch: A Distributed System Monitoring Platform for Real‑Time RPC Visibility

Overwatch is an open‑source distributed monitoring platform built by Dada‑Jingdong Home that collects, aggregates, and visualizes RPC traffic across thousands of micro‑services in real time, enabling engineers to quickly pinpoint the root cause of system failures using directed‑graph visualizations and CQRS‑based data queries.

CQRSKafkaRPC
0 likes · 10 min read
Overwatch: A Distributed System Monitoring Platform for Real‑Time RPC Visibility
Qunar Tech Salon
Qunar Tech Salon
Sep 25, 2017 · Big Data

Comprehensive Guide to Spark Ecosystem: Data Warehouse, Machine Learning, Streaming, and Enterprise Use Cases

This article provides an extensive overview of Apache Spark’s ecosystem—including its data‑warehouse capabilities, ML/MLlib libraries, streaming with Spark Streaming, external frameworks, and real‑world enterprise case studies—while also noting a promotional announcement for a React Native conference.

Big DataKafkaSpark
0 likes · 21 min read
Comprehensive Guide to Spark Ecosystem: Data Warehouse, Machine Learning, Streaming, and Enterprise Use Cases
21CTO
21CTO
Sep 14, 2017 · Backend Development

How PhxQueue Achieves High‑Availability, High‑Throughput Distributed Queuing with Paxos

PhxQueue is a Tencent‑open‑source, Paxos‑based distributed queue that delivers at‑least‑once delivery, synchronous disk flushing, strict ordering, multi‑subscription, and high throughput, outperforming Kafka in reliability and failover scenarios while supporting massive workloads such as WeChat Pay.

KafkaPaxosWeChat
0 likes · 17 min read
How PhxQueue Achieves High‑Availability, High‑Throughput Distributed Queuing with Paxos
WeChat Backend Team
WeChat Backend Team
Sep 12, 2017 · Backend Development

How PhxQueue Achieves High‑Throughput, High‑Reliability Distributed Queuing with Paxos

PhxQueue, an open‑source, Paxos‑based distributed queue from WeChat, delivers at‑least‑once delivery, synchronous disk flushing, strict ordering, multi‑subscription, and high availability, outperforming Kafka in reliability and latency while maintaining comparable throughput, as demonstrated through detailed design, performance, and failover analyses.

Distributed SystemsKafkaPaxos
0 likes · 26 min read
How PhxQueue Achieves High‑Throughput, High‑Reliability Distributed Queuing with Paxos
dbaplus Community
dbaplus Community
Sep 5, 2017 · Big Data

Why Kafka Needs High Availability: Deep Dive into Replication and Leader Election

This article explains why Kafka introduced High Availability in version 0.8, covering the necessity of data replication and leader election, the internal replication and ACK mechanisms, Zookeeper metadata structures, broker failover procedures, and the command‑line tools that help manage and rebalance a Kafka cluster.

KafkaReplicationhigh availability
0 likes · 36 min read
Why Kafka Needs High Availability: Deep Dive into Replication and Leader Election
BiCaiJia Technology Team
BiCaiJia Technology Team
Sep 2, 2017 · Backend Development

Integrate Kafka with Spring Boot 1.4 Using Spring Integration – Step‑by‑Step Guide

This guide walks you through setting up Kafka and Zookeeper, adding Spring Integration dependencies, configuring application.yml, creating producer and consumer configurations with @Configuration and @EnableKafka, implementing a @KafkaListener, and testing the integration via a Spring MVC endpoint, while highlighting common pitfalls.

KafkaMessagingSpring Boot
0 likes · 6 min read
Integrate Kafka with Spring Boot 1.4 Using Spring Integration – Step‑by‑Step Guide
Architecture Digest
Architecture Digest
Aug 29, 2017 · Big Data

Introduction to Apache Kafka: Concepts, Architecture, and Core APIs

This article provides a comprehensive overview of Apache Kafka, explaining its role in real‑time data pipelines and stream processing, describing key concepts such as topics, partitions, logs, producers, consumers, replication, guarantees, and how Kafka functions as both a messaging and storage system.

Consumer APIDistributed StreamingKafka
0 likes · 13 min read
Introduction to Apache Kafka: Concepts, Architecture, and Core APIs
21CTO
21CTO
Jul 23, 2017 · Backend Development

Comparing Kafka and RocketMQ: Architecture, Availability, and Reliability Insights

This article examines the architectures of Kafka and RocketMQ, analyzes their availability and reliability mechanisms, evaluates their strengths and weaknesses, and proposes a hybrid MQ design that combines the benefits of both systems while simplifying dependencies and improving fault tolerance.

AvailabilityKafkaMessage Queue
0 likes · 13 min read
Comparing Kafka and RocketMQ: Architecture, Availability, and Reliability Insights
21CTO
21CTO
Jul 20, 2017 · Backend Development

How Ctrip Built a Real-Time User Data Collection System with Netty and Kafka

This article details Ctrip's design and implementation of a high‑throughput, low‑latency user data collection platform that leverages Java NIO, Netty, and a custom Kafka‑based messaging layer, covering architecture, encryption, compression, disaster‑recovery, performance testing, and downstream analytics products.

AvroBackend ArchitectureData Streaming
0 likes · 17 min read
How Ctrip Built a Real-Time User Data Collection System with Netty and Kafka
Architecture Digest
Architecture Digest
Jul 18, 2017 · Backend Development

Design and Implementation of Ctrip Real‑Time User Data Collection System

This article describes the design, technology selection, and performance evaluation of Ctrip's real‑time user behavior data collection platform, covering Netty‑based network handling, Kafka/Hermes messaging, encryption, compression, Avro backup, and related analytics products, with detailed feasibility analysis and benchmark results.

Backend ArchitectureDistributed SystemsKafka
0 likes · 17 min read
Design and Implementation of Ctrip Real‑Time User Data Collection System
21CTO
21CTO
Jul 8, 2017 · Big Data

Ctrip’s Scalable Real‑Time User Behavior System with Kafka, Storm, Redis

This article details Ctrip’s redesign of its real‑time user behavior service, covering the new architecture, data flow, use of Java, Kafka, Storm, Redis, and MySQL, and how it achieves high real‑time performance, availability, scalability, and fault‑tolerance to support massive travel‑industry traffic.

KafkaReal-TimeStorm
0 likes · 12 min read
Ctrip’s Scalable Real‑Time User Behavior System with Kafka, Storm, Redis
21CTO
21CTO
Jun 11, 2017 · Big Data

How Kafka Guarantees High Reliability – Architecture, Replication & Benchmarks

This article explains Kafka's distributed architecture, topic‑partition model, replication and ISR mechanisms, data durability settings, delivery guarantees, deduplication strategies, and presents benchmark results that illustrate how configuration choices affect throughput and latency in real‑world deployments.

Distributed MessagingKafkaReplication
0 likes · 33 min read
How Kafka Guarantees High Reliability – Architecture, Replication & Benchmarks
Architecture Digest
Architecture Digest
Jun 11, 2017 · Big Data

Kafka High‑Reliability Architecture, Storage Mechanisms, Replication, and Benchmark Analysis

This article explains Kafka's distributed architecture, its topic‑partition storage model, replication and synchronization mechanisms, reliability guarantees such as ISR and high‑watermark, and presents benchmark results that illustrate how replication factor, acks settings, and partition count affect throughput and latency.

KafkaReliabilitybenchmark
0 likes · 34 min read
Kafka High‑Reliability Architecture, Storage Mechanisms, Replication, and Benchmark Analysis
Architecture Digest
Architecture Digest
Jun 9, 2017 · Big Data

A Comprehensive Guide for Big Data Beginners: From Hadoop Fundamentals to Machine Learning

This guide walks beginners through the entire big‑data ecosystem, covering the 4V characteristics, core open‑source frameworks, Hadoop setup, Hive and SQL on Hadoop, data ingestion and export tools, task scheduling, real‑time processing with Kafka, Storm and Spark Streaming, and an introduction to machine‑learning applications.

HadoopKafkaSpark
0 likes · 17 min read
A Comprehensive Guide for Big Data Beginners: From Hadoop Fundamentals to Machine Learning
MaGe Linux Operations
MaGe Linux Operations
May 28, 2017 · Backend Development

Understanding Kafka’s Architecture: Topics, Partitions, and Reliability

This article explains Kafka’s core architecture—including brokers, topics, partitions, offsets, producer and consumer mechanics, replication, availability, consistency, persistence, performance optimizations, and Zookeeper integration—providing a comprehensive guide for building reliable distributed messaging systems.

Distributed MessagingKafkaOFFSET
0 likes · 15 min read
Understanding Kafka’s Architecture: Topics, Partitions, and Reliability
Architecture Digest
Architecture Digest
May 18, 2017 · Backend Development

Design and Architecture of Ctrip's Real‑Time User Behavior Service

The article describes how Ctrip rebuilt its real‑time user behavior platform using a Java‑based stack (Kafka, Storm, Redis, MySQL) to achieve millisecond‑level latency, high availability, scalable performance, and robust handling of traffic spikes, failures, and data back‑pressure.

Backend ArchitectureKafkaReal-Time
0 likes · 12 min read
Design and Architecture of Ctrip's Real‑Time User Behavior Service
Architecture Digest
Architecture Digest
Apr 27, 2017 · Big Data

Kafka High‑Reliability Architecture, Storage Mechanisms, and Performance Benchmark

This article explains Kafka's distributed architecture, its topic‑partition storage model, replication and ISR mechanisms, leader election, delivery guarantees, configuration for high reliability, and presents extensive benchmark results showing how replication factor, acks settings, and partition count affect throughput and latency.

Kafkahigh reliabilityperformance benchmark
0 likes · 39 min read
Kafka High‑Reliability Architecture, Storage Mechanisms, and Performance Benchmark
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Apr 21, 2017 · Backend Development

Mastering Kafka: Producer‑Consumer vs Pub/Sub Patterns for Scalable Backend Design

This article explains Kafka's core concepts and compares producer‑consumer and publish‑subscribe models, illustrating how to apply each pattern for data ingestion and event distribution in distributed backend systems, and offers practical design alternatives when Kafka’s native capabilities fall short.

Backend ArchitectureKafkaMessage Queue
0 likes · 10 min read
Mastering Kafka: Producer‑Consumer vs Pub/Sub Patterns for Scalable Backend Design
Qunar Tech Salon
Qunar Tech Salon
Apr 21, 2017 · Big Data

Ensuring Exact‑Once Semantics in Spark Streaming with Kafka: Offline Repair and Data Deduplication Strategies

This article explains why Spark Streaming combined with Kafka can only guarantee at‑least‑once delivery, outlines the challenges of delayed and out‑of‑order events, and presents practical offline‑repair, deduplication, and output‑format techniques—including code examples—to achieve exact‑once semantics in big‑data pipelines.

Exact-OnceHBaseHDFS
0 likes · 11 min read
Ensuring Exact‑Once Semantics in Spark Streaming with Kafka: Offline Repair and Data Deduplication Strategies
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Apr 10, 2017 · Operations

Sentinel Monitoring System: Real‑Time Business Log Monitoring and Incident Detection for an Airline Ticket Platform

The Sentinel system was built to provide real‑time, zero‑modification monitoring of airline ticket business services by consuming Tianwang logs through a Storm cluster, offering flexible rule configuration, addressing performance pitfalls, and planning future enhancements such as custom monitoring scripts and visual dashboards.

KafkaLog ProcessingReal-Time
0 likes · 6 min read
Sentinel Monitoring System: Real‑Time Business Log Monitoring and Incident Detection for an Airline Ticket Platform
Efficient Ops
Efficient Ops
Mar 20, 2017 · Big Data

How eBay Built a Scalable Kafka‑Based Real‑Time Data Transmission Platform

This article details eBay's year‑long development of an enterprise‑grade, Kafka‑driven data transmission platform, covering its architecture, core services, monitoring and automation strategies, as well as performance tuning techniques that enable high throughput, low latency, and reliable cross‑data‑center replication.

Data StreamingKafkaReal-time Processing
0 likes · 22 min read
How eBay Built a Scalable Kafka‑Based Real‑Time Data Transmission Platform
Tencent Cloud Developer
Tencent Cloud Developer
Feb 14, 2017 · Databases

TDSQL Audit Capability: Architecture, Kafka Integration, and Consistency Hash Implementation

TDSQL’s cloud‑based audit solution combines a three‑proxy high‑availability layer, Kafka’s O(1) persistent messaging, and a distributed audit‑server that uses consistent hashing and multi‑coroutine processing to consume data within seconds, while fault‑tolerant offsets, majority acknowledgments, and Tencent Cloud MongoDB storage ensure secure, ordered, scalable, and highly reliable audit logging.

KafkaMongoDBTDSQL
0 likes · 7 min read
TDSQL Audit Capability: Architecture, Kafka Integration, and Consistency Hash Implementation
dbaplus Community
dbaplus Community
Feb 13, 2017 · Backend Development

Why Message Queues Are Essential for Scalable Distributed Systems

Message queues act as a crucial middleware component in distributed systems, addressing coupling, asynchronous processing, traffic shaping, and high availability, with real-world scenarios such as asynchronous handling, decoupling, traffic throttling, logging, and communication, while reviewing popular solutions like ActiveMQ, RabbitMQ, ZeroMQ, Kafka, and JMS.

Backend ArchitectureJMSKafka
0 likes · 20 min read
Why Message Queues Are Essential for Scalable Distributed Systems
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Feb 10, 2017 · Information Security

Securing Kafka with Kerberos and ACLs: A Practical Guide

This article explains Kafka's architecture, identifies its security vulnerabilities, and presents Transwarp's Kerberos authentication and ACL-based authorization solutions, including configuration steps, code examples, and best practices for building a secure Kafka service.

ACLKafkaKerberos
0 likes · 12 min read
Securing Kafka with Kerberos and ACLs: A Practical Guide
21CTO
21CTO
Jan 18, 2017 · Big Data

Build a Lightweight, High‑Availability Real‑Time Stream Processing System

Learn how to construct a simple, high‑availability real‑time stream processing platform using lightweight components such as Kafka, Zookeeper, Thrift/Avro, and optional storage like MongoDB or Elasticsearch, offering a practical alternative to heavyweight frameworks like Storm and Spark Streaming for small‑to‑medium enterprises.

Big DataKafkaReal-Time
0 likes · 5 min read
Build a Lightweight, High‑Availability Real‑Time Stream Processing System