Tagged articles

1273 articles

Page 13 of 13

Jan 5, 2017 · Big Data

Master Kafka: Install, Run a Single Node, Build a Cluster, and Use Kafka Connect

This step‑by‑step guide walks you through installing Kafka, starting a single‑node broker, producing and consuming messages, configuring a multi‑node cluster with replication, testing fault tolerance, and using Kafka Connect to import and export data between files.

ClusterInstallationKafka

0 likes · 8 min read

Master Kafka: Install, Run a Single Node, Build a Cluster, and Use Kafka Connect

360 Zhihui Cloud Developer

Dec 20, 2016 · Backend Development

How QBus Redefines Messaging: A Deep Dive into 360’s Custom Kafka‑Based Queue

This article introduces QBus, a Kafka‑derived, highly‑available message queue built for 360, detailing its origins, core features, architectural design, performance advantages over traditional queues, and Java SDK usage examples for producers and consumers.

KafkaMessage QueueQBus

0 likes · 6 min read

How QBus Redefines Messaging: A Deep Dive into 360’s Custom Kafka‑Based Queue

dbaplus Community

Dec 18, 2016 · Big Data

How DWS Uses Log‑Based Architecture for Real‑Time Data Integration

This article explains the design and implementation of the DWS platform, detailing its log‑driven architecture with Dbus, Wormhole, and Swifts, the technical choices behind real‑time data extraction, transformation, and delivery, and real‑world use cases in finance.

CDCCanalKafka

0 likes · 22 min read

How DWS Uses Log‑Based Architecture for Real‑Time Data Integration

ITFLY8 Architecture Home

Dec 13, 2016 · Big Data

Umeng’s Mobile Big Data Platform: Architecture, Challenges & Insights

The article details Umeng’s mobile big‑data platform architecture, describing its Lambda‑style hybrid design, data ingestion pipeline with dual Kafka clusters, offline and real‑time processing using Hadoop, Spark, Storm, and storage layers such as HDFS, HBase, MongoDB and Elasticsearch, while also discussing challenges in data collection, cleaning, computation, security, and value‑added services.

Data ArchitectureHadoopKafka

0 likes · 13 min read

Umeng’s Mobile Big Data Platform: Architecture, Challenges & Insights

Meituan Technology Team

Nov 4, 2016 · Big Data

Design and Implementation of a Low-Latency App Exception Monitoring Platform Using Spark Streaming, Kafka, and Elasticsearch

The paper presents a production‑grade, low‑cost mobile‑app exception monitoring platform built on Spark Streaming, Kafka, and Elasticsearch that achieves high availability through exactly‑once processing and checkpointing, minute‑level latency by decoupling raw and symbolized logs, high throughput via reservoir sampling, and dynamic scalability without code changes.

Big DataElasticsearchException Monitoring

0 likes · 11 min read

Design and Implementation of a Low-Latency App Exception Monitoring Platform Using Spark Streaming, Kafka, and Elasticsearch

Efficient Ops

Oct 27, 2016 · Information Security

Tech World Shake‑Up: DNS Outage, Apple ARM Support, Google Strategy, Kafka Updates

A roundup of recent tech developments covering a massive US DNS outage caused by IoT‑based DDoS attacks, Apple’s addition of ARM support to macOS Sierra, Google’s evolving 20% time policy, new multi‑data‑center features in Confluent Kafka, MariaDB’s new member, a critical OpenSSL flaw, China Mobile’s OpenStack award, and Tencent’s rapid Nexus 6P hack.

AppleDNSGoogle

0 likes · 8 min read

Tech World Shake‑Up: DNS Outage, Apple ARM Support, Google Strategy, Kafka Updates

dbaplus Community

Oct 19, 2016 · Backend Development

When to Use Kafka, RabbitMQ, or ZeroMQ: A Practical MQ Guide

This article explains the true purpose of message queues, classifies them into broker‑based and broker‑less families, compares Kafka, RabbitMQ, and ZeroMQ in terms of performance, flexibility, and lightweight distribution, and clarifies that MQs can support both asynchronous and synchronous communication.

KafkaMessage QueueRabbitMQ

0 likes · 8 min read

When to Use Kafka, RabbitMQ, or ZeroMQ: A Practical MQ Guide

GF Securities FinTech

Sep 28, 2016 · Backend Development

How Event Sourcing and a Go DSL Power a Scalable Points System

This article explains how a financial e‑commerce platform uses the Event Sourcing architecture pattern, an asynchronous message bus, and a Go‑based domain‑specific language to build a flexible, exactly‑once points system that decouples business rules from application code and simplifies operations.

DSLEvent SourcingGo

0 likes · 17 min read

How Event Sourcing and a Go DSL Power a Scalable Points System

Architecture Digest

Sep 21, 2016 · Big Data

Log Platform Architecture and Scaling Lessons from Vipshop's 419 Promotion

This article presents a detailed case study of Vipshop's log platform during the 419 sales event, analyzing the 2013 architecture, bottlenecks in RabbitMQ and Storm, and the subsequent redesign using Kafka, Impala, and HBase to achieve scalable, reliable big‑data processing.

Big DataImpalaKafka

0 likes · 16 min read

Log Platform Architecture and Scaling Lessons from Vipshop's 419 Promotion

Architecture Digest

Sep 12, 2016 · Artificial Intelligence

Design and Implementation of a Real‑Time, Highly Available General Recommendation Platform at YHD

The article describes how YHD's precision recommendation team built a real‑time, highly available, traceable general recommendation platform, detailing its background, overall architecture, visual configuration and traceability subsystems, and reporting significant improvements in development speed, reuse and user satisfaction.

HBaseKafkaReal-Time

0 likes · 8 min read

Design and Implementation of a Real‑Time, Highly Available General Recommendation Platform at YHD

Architecture Digest

Sep 10, 2016 · Big Data

Designing a Real-Time Stream Computing Platform for E‑commerce Peak Traffic at Yihaodian

The article describes how Yihaodian built a low‑latency, highly available, and easily scalable streaming computation platform using Storm, Kafka, Linux containers and a custom CGroup management framework to handle massive e‑commerce traffic spikes and real‑time analytics.

KafkaResource IsolationStorm

0 likes · 9 min read

Designing a Real-Time Stream Computing Platform for E‑commerce Peak Traffic at Yihaodian

dbaplus Community

Sep 6, 2016 · Big Data

Choosing the Right Log Collection Framework for Massive Data Streams

This article reviews major open‑source log collection tools—Chukwa, Scribe, Flume, Logstash, Kafka, and TT—examining their architectures, strengths, and limitations to help engineers select the most suitable solution for high‑volume, low‑latency data pipelines.

Apache FlumeDistributed SystemsKafka

0 likes · 13 min read

Choosing the Right Log Collection Framework for Massive Data Streams

dbaplus Community

Aug 18, 2016 · Big Data

How Zhejiang Mobile Scaled Billion‑Level Real‑Time Stream Processing with Storm

This article details Zhejiang Mobile's architecture and practical experience in building a billion‑scale real‑time stream computing platform using Storm, Kafka, Flume, and Redis, covering use cases, system design, performance bottlenecks, optimization techniques, and monitoring strategies.

Apache StormBig Data ArchitectureFlume

0 likes · 20 min read

How Zhejiang Mobile Scaled Billion‑Level Real‑Time Stream Processing with Storm

Architecture Digest

Aug 17, 2016 · Backend Development

Design and Optimization of Bilibili Live Chat (GOIM) System

The article presents a detailed overview of Bilibili's GOIM live chat architecture, covering its high‑stability, high‑availability, low‑latency design, component breakdown, memory and module optimizations, network improvements, and performance testing results to achieve scalable real‑time messaging.

Backend ArchitectureGoKafka

0 likes · 13 min read

Design and Optimization of Bilibili Live Chat (GOIM) System

Ctrip Technology

Aug 12, 2016 · Big Data

Ctrip's Real-Time Data Platform: Architecture, Practices, and Lessons Learned

This article details Ctrip's journey building a unified real-time data platform—covering business motivations, architectural requirements, technology choices like Kafka and Storm, implementation of Avro schemas, monitoring, alerting, operational lessons, and future explorations such as Streaming CQL and JStorm.

AlertingBig DataKafka

0 likes · 15 min read

Ctrip's Real-Time Data Platform: Architecture, Practices, and Lessons Learned

Meituan Technology Team

Aug 5, 2016 · Big Data

Design and Implementation of a Large-Scale User Behavior Analytics Platform

The article outlines Meituan‑Dianping’s “Sensors Analytics” platform, a privately‑deployed, open‑PaaS solution that collects full‑stack user events from iOS, Android, Web and WeChat, maps IDs in near real‑time, stores detailed records in Kudu (real‑time) and Parquet (offline), and serves low‑latency queries via Impala, addressing the architectural and operational challenges of high‑throughput ingestion and data‑security requirements.

ImpalaKafkaKudu

0 likes · 8 min read

Design and Implementation of a Large-Scale User Behavior Analytics Platform

Qunar Tech Salon

Jul 27, 2016 · Big Data

Building a Unified Real-Time Data Platform at Ctrip: Architecture, Practices, and Lessons Learned

This article describes Ctrip's development of a unified real-time data platform, detailing its motivations, architectural choices such as Kafka and Storm, implementation of shared schemas, resource control, monitoring, and operational lessons, as well as experiences with Storm, JStorm, and Streaming CQL.

Big DataCtripData Platform

0 likes · 15 min read

Building a Unified Real-Time Data Platform at Ctrip: Architecture, Practices, and Lessons Learned

Architecture Digest

Jul 26, 2016 · Big Data

Real-Time Order Analytics System Architecture Using Flume, Kafka, Storm, and Redis

This article introduces a beginner-friendly architecture for real-time order analytics in a big‑data environment, detailing how Flume collects logs, Kafka buffers them, Storm processes streams, and Redis stores results, while also covering configuration, code snippets, deployment steps, and troubleshooting tips.

FlumeKafkaStorm

0 likes · 26 min read

Architect

Jun 15, 2016 · Backend Development

Understanding Kafka's SocketServer: Acceptor, Processor, and RequestChannel Architecture

This article explains the internal design of Kafka's SocketServer, detailing its NIO‑based thread model with Acceptor, Processor, and Handler threads, the startup sequence, how connections are accepted and processed, and the role of RequestChannel in routing requests and responses between processors and handlers.

BackendKafkaScala

0 likes · 17 min read

Architect

May 30, 2016 · Backend Development

Backend Log Management Threads, Log Cleaning, and Compaction in Distributed Kafka Systems

This article explains how Kafka's LogManager loads existing logs, manages background threads for flushing, checkpointing, cleaning, and compaction, and details the code implementations and strategies for log retention, segment cleanup, and log compression in a distributed storage environment.

KafkaLog CleaningLog Management

0 likes · 15 min read

Backend Log Management Threads, Log Cleaning, and Compaction in Distributed Kafka Systems

Architecture Digest

May 22, 2016 · Big Data

Design and Architecture of Youzan Unified Log Platform

The article details the design, components, and operational challenges of Youzan's unified log platform, describing its multi‑layer architecture, ingestion methods using rsyslog/logstash and Flume‑NG, Kafka‑based log center, processing pipelines with Storm/Spark, and storage in HDFS and Elasticsearch.

Distributed SystemsFlumeKafka

0 likes · 10 min read

Design and Architecture of Youzan Unified Log Platform

21CTO

May 16, 2016 · Operations

How to Centralize Logs from Dockerized Services Using Flume and Kafka

This article explains a practical architecture for aggregating logs from distributed Docker containers by employing Flume NG as a lightweight log collector, Kafka as a high‑throughput message bus, and custom sinks to store logs per service, module and day with low latency and minimal resource impact.

DockerFlumeKafka

0 likes · 17 min read

How to Centralize Logs from Dockerized Services Using Flume and Kafka

Architect

May 16, 2016 · Operations

Centralized Log Collection for Distributed Docker Services Using Flume and Kafka

This article presents a practical architecture for centrally collecting dispersed logs from Docker‑based services in a distributed environment by leveraging Flume NG as a non‑intrusive log agent, Kafka as a high‑throughput message bus, and custom sinks to partition logs by service, module, and day.

Distributed SystemsDockerKafka

0 likes · 15 min read

Centralized Log Collection for Distributed Docker Services Using Flume and Kafka

21CTO

May 15, 2016 · Big Data

How LinkedIn Scales Kafka to Trillions of Messages: Lessons in Reliability, Cost, and Security

LinkedIn’s engineering team details how they have scaled Apache Kafka from billions to over a trillion daily messages, focusing on quotas, a new ZooKeeper‑free consumer, reliability enhancements, security features, monitoring frameworks, and ecosystem integrations to improve cost, availability, and performance.

KafkaLinkedInReliability

0 likes · 13 min read

How LinkedIn Scales Kafka to Trillions of Messages: Lessons in Reliability, Cost, and Security

Art of Distributed System Architecture Design

May 11, 2016 · Industry Insights

How LinkedIn Scales Kafka to Over 1 Trillion Messages Daily

LinkedIn’s engineering team details how they scaled Kafka from a few billion to over a trillion daily messages, covering quotas, a new ZooKeeper‑free consumer, reliability upgrades, security roadmaps, monitoring frameworks, failure testing, cluster balancing, and ecosystem integrations.

KafkaLinkedInReliability

0 likes · 12 min read

How LinkedIn Scales Kafka to Over 1 Trillion Messages Daily

Architect

May 4, 2016 · Big Data

Kafka Main Configuration Parameters – Broker, Producer, Consumer, and Topic Settings

This article provides a comprehensive overview of Kafka's core configuration options, detailing default values and descriptions for broker, producer, consumer, and topic‑level settings to help administrators fine‑tune performance, reliability, and resource usage.

Big DataBrokerConfiguration

0 likes · 23 min read

Kafka Main Configuration Parameters – Broker, Producer, Consumer, and Topic Settings

Architect

Apr 28, 2016 · Big Data

Design and Architecture of Youzan Unified Log Platform

The article describes the design, components, and implementation details of Youzan's unified log platform, covering log ingestion via rsyslog, Logstash, and Flume, centralized processing with Kafka, real‑time analysis using Storm/Spark, and storage in HDFS, Elasticsearch, and Hawk, while also discussing challenges and future improvements.

ElasticsearchHDFSKafka

0 likes · 10 min read

21CTO

Apr 14, 2016 · Cloud Computing

How Netflix’s EVCache Powers Global Low‑Latency Caching Across Regions

This article explains how Netflix uses the open‑source EVCache system, built on Memcached and Kafka, to provide highly reliable, low‑latency caching for its micro‑services architecture across multiple AWS regions, handling billions of objects and millions of requests per second.

Distributed SystemsEVCacheKafka

0 likes · 9 min read

How Netflix’s EVCache Powers Global Low‑Latency Caching Across Regions

Architecture Digest

Mar 28, 2016 · Big Data

Overview of the Hadoop Ecosystem and Modern Big Data Technologies

This article provides a comprehensive overview of Hadoop and its surrounding ecosystem, detailing core components, storage principles, key algorithms, and a wide range of modern big‑data technologies such as Spark, Flink, Kafka, NoSQL databases, and cloud‑based processing platforms.

Big DataHadoopKafka

0 likes · 11 min read

Overview of the Hadoop Ecosystem and Modern Big Data Technologies

MaGe Linux Operations

Mar 28, 2016 · Backend Development

Understanding JMS: Message Models, Consumption, and Popular Middleware

This article explains the JMS standard, its two messaging models (Point‑to‑Point and Publish/Subscribe), how messages are consumed synchronously or asynchronously, the core JMS programming objects, and provides an overview of common middleware such as ActiveMQ, RabbitMQ, ZeroMQ, and Kafka.

ActiveMQJMSKafka

0 likes · 17 min read

Understanding JMS: Message Models, Consumption, and Popular Middleware

Architect

Mar 22, 2016 · Backend Development

Youzan Search Engine Practice – Engineering Part: Architecture, Indexing, and Performance Optimization

This article describes the practical architecture of Youzan's commercial e‑commerce search engine, covering data source integration, distributed real‑time indexing with Elasticsearch, Hadoop and Kafka, advanced search modules, and several performance‑tuning techniques for large‑scale deployments.

Backend ArchitectureElasticsearchKafka

0 likes · 13 min read

Youzan Search Engine Practice – Engineering Part: Architecture, Indexing, and Performance Optimization

Architecture Digest

Mar 22, 2016 · Backend Development

Evolution of LinkedIn’s Backend Architecture: From the Leo Monolith to a Scalable Service‑Oriented Platform

The article chronicles LinkedIn’s journey from a single‑server Leo monolith to a highly distributed, service‑oriented backend architecture, detailing the introduction of member graphs, read‑only replicas, caching layers, Kafka pipelines, Rest.li APIs, super‑blocks, and multi‑data‑center deployments to support billions of daily requests.

Backend ArchitectureDistributed SystemsKafka

0 likes · 9 min read

Architect

Mar 21, 2016 · Big Data

Introduction to Apache Flume: Architecture, Core Concepts, Configuration and Usage

This article provides a comprehensive overview of Apache Flume, covering its design goals, core components, deployment architecture, configuration patterns, and step‑by‑step instructions for integrating Flume with Zookeeper and Kafka to collect and forward massive log data.

Apache FlumeKafkaZooKeeper

0 likes · 6 min read

Introduction to Apache Flume: Architecture, Core Concepts, Configuration and Usage

21CTO

Mar 20, 2016 · Backend Development

How LinkedIn Scaled to 350 Million Users: From Leo Monolith to 750+ Microservices

LinkedIn grew from a single monolithic Leo server handling all web requests to a complex ecosystem of over 750 independent services, employing graph databases, read replicas, caching layers, Kafka pipelines, Rest.li APIs, and multi‑data‑center deployments to support billions of daily queries.

Distributed SystemsKafkaMicroservices

0 likes · 9 min read

How LinkedIn Scaled to 350 Million Users: From Leo Monolith to 750+ Microservices

Architect

Mar 12, 2016 · Backend Development

Design and Evolution of Ctrip's Hermes Message Queue System

This article presents a detailed overview of Ctrip's Hermes message queue system, covering its architectural evolution from a simple Mongo‑based design to a broker‑centric, multi‑storage solution with meta‑server coordination, and discusses practical techniques for building high‑performance, scalable messaging infrastructure.

Cluster ManagementCtripDistributed Systems

0 likes · 21 min read

Design and Evolution of Ctrip's Hermes Message Queue System

Architect

Mar 8, 2016 · Big Data

Kafka Benchmark: Producer and Consumer Throughput, Replication, Message Size, and Latency Analysis

This article presents a comprehensive Kafka benchmark using six machines to evaluate producer and consumer throughput, replication effects, message size impact, and end‑to‑end latency, providing detailed results, analysis, and reproducible test commands.

Big DataKafkaLatency

0 likes · 12 min read

Kafka Benchmark: Producer and Consumer Throughput, Replication, Message Size, and Latency Analysis

Architect

Mar 8, 2016 · Big Data

In‑Depth Analysis of Apache Kafka: Architecture, Core Concepts, and Benchmark

This article provides a comprehensive technical overview of Apache Kafka, covering its architecture, core concepts, design goals, comparison with other message queues, replication, consumer groups, delivery guarantees, and performance benchmarking, making it a valuable resource for big‑data engineers.

Big DataKafkaReplication

0 likes · 30 min read

In‑Depth Analysis of Apache Kafka: Architecture, Core Concepts, and Benchmark

21CTO

Mar 7, 2016 · Backend Development

When to Choose Kafka Over RabbitMQ: A Practical Comparison

This article compares Kafka and RabbitMQ, examining their design philosophies, throughput capabilities, consumer diversity, message ordering, and handling of individual messages, to help engineers decide which system suits high-volume or flexible-consumer scenarios and understand the trade-offs of each technology.

KafkaRabbitMQStreaming

0 likes · 7 min read

When to Choose Kafka Over RabbitMQ: A Practical Comparison

Architecture Digest

Mar 6, 2016 · Backend Development

Message Queue Overview, Application Scenarios, and Middleware Examples

This article introduces the fundamentals of message queues, explains common use cases such as asynchronous processing, system decoupling, traffic shaping, and log handling, and reviews popular middleware implementations including JMS, ActiveMQ, RabbitMQ, ZeroMQ, and Kafka.

BackendDistributed SystemsJMS

0 likes · 18 min read

Message Queue Overview, Application Scenarios, and Middleware Examples

Java High-Performance Architecture

Feb 29, 2016 · Backend Development

How Kafka Stores and Retrieves Messages: Inside Partitions, Segments, and Index Files

Kafka persists messages on disk by organizing each topic into multiple partitions, which are further divided into segment files containing paired .index and .log files; this structure enables efficient storage, offset-based lookup, and fast retrieval of specific messages through binary search across segment indexes.

KafkaMessage Queuestorage architecture

0 likes · 5 min read

How Kafka Stores and Retrieves Messages: Inside Partitions, Segments, and Index Files

Java High-Performance Architecture

Feb 28, 2016 · Big Data

How Kafka Ensures High Availability with Leader‑Follower Replication

Kafka introduced a high‑availability mechanism in version 0.8 by replicating partitions across multiple brokers, designating a leader and followers, using an in‑sync replica (ISR) list to balance synchronous and asynchronous replication, and employing leader election strategies to maintain data integrity during failures.

ISRKafkaReplication

0 likes · 4 min read

How Kafka Ensures High Availability with Leader‑Follower Replication

Java High-Performance Architecture

Feb 25, 2016 · Backend Development

Understanding Kafka: Architecture, Use Cases, and Core Components

This article explains Kafka's high‑throughput distributed messaging architecture, its typical scenarios such as log and operational data collection, the roles of producers, brokers, topics, consumers, Zookeeper, and provides a practical example of detecting abnormal user transactions.

Data StreamingKafkaLog Processing

0 likes · 3 min read

Understanding Kafka: Architecture, Use Cases, and Core Components

Architecture Digest

Feb 25, 2016 · Backend Development

Ctrip's Hermes Asynchronous Messaging System: Architecture, Evolution, and High‑Performance Practices

The article presents a detailed overview of Ctrip's Hermes asynchronous messaging system, describing its architectural evolution from a simple Mongo‑based queue to a broker‑centric design with MySQL and Kafka back‑ends, and explains optimization techniques for single‑node performance, clustering, lease‑based management, and reliable delivery.

BrokerCtripHermes

0 likes · 22 min read

Architect

Feb 23, 2016 · Big Data

Kafka High Availability Design: Data Replication and Leader Election

This article explains why Kafka introduced high‑availability features after version 0.8, detailing the necessity of data replication and leader election, describing Kafka’s replica distribution algorithm, replication mechanics, acknowledgment requirements, leader‑election strategies, Zookeeper structures, and the broker failover process.

KafkaReplicationZooKeeper

0 likes · 19 min read

Kafka High Availability Design: Data Replication and Leader Election

21CTO

Feb 23, 2016 · Big Data

Why Kafka Dominates Modern Data Pipelines: Architecture, Benefits, and Guarantees

Kafka, the open‑source distributed messaging system from LinkedIn, offers O(1) persistence, high throughput, partitioned topics, and flexible delivery guarantees, making it a cornerstone for modern big‑data pipelines and real‑time processing alongside Hadoop, Spark, and Storm.

Big DataConsumerDelivery Guarantees

0 likes · 21 min read

Why Kafka Dominates Modern Data Pipelines: Architecture, Benefits, and Guarantees

21CTO

Feb 14, 2016 · Backend Development

Unlocking High‑Performance Systems: How Message Queues Transform Backend Architecture

This article provides a comprehensive overview of message queues, covering their core concepts, key application scenarios such as asynchronous processing, system decoupling, traffic shaping, log handling, and communication, and examines popular middleware like ActiveMQ, RabbitMQ, ZeroMQ, and Kafka, along with JMS models and programming details.

JMSKafkaMessage Queue

0 likes · 22 min read

Unlocking High‑Performance Systems: How Message Queues Transform Backend Architecture

21CTO

Feb 6, 2016 · Backend Development

How LinkedIn Scaled to 300M Users: Lessons from a Decade of Backend Architecture

This article chronicles LinkedIn's evolution from a monolithic Leo application to a massive micro‑service ecosystem, detailing the introduction of member graphs, read‑only replicas, caching layers, Kafka pipelines, Rest.li APIs, super‑blocks, and multi‑data‑center strategies that enable handling billions of requests daily.

Backend ArchitectureKafkaLinkedIn

0 likes · 8 min read

How LinkedIn Scaled to 300M Users: Lessons from a Decade of Backend Architecture

Qunar Tech Salon

Jan 12, 2016 · Backend Development

Snapdeal Ads: Architecture and Lessons for Building a Scalable Web System Handling 5 Billion Daily Requests

The article details Snapdeal's Ads platform architecture, key strategies, infrastructure, requirements, and technologies that enable a highly available, low‑latency backend capable of processing billions of daily requests with a small engineering team.

ADSBackendKafka

0 likes · 7 min read

Snapdeal Ads: Architecture and Lessons for Building a Scalable Web System Handling 5 Billion Daily Requests

21CTO

Jan 9, 2016 · Big Data

How We Scaled Real‑Time Log Analysis to 2 TB Daily with ELK

This article shares the author's practical experience building a real‑time log analysis platform at Sina, covering service scope, ELK architecture, performance optimizations, usability improvements, new features, common pitfalls, and a concise Q&A for engineers handling massive log streams.

ELKElasticsearchKafka

0 likes · 12 min read

How We Scaled Real‑Time Log Analysis to 2 TB Daily with ELK

Architect

Dec 30, 2015 · Big Data

Real-Time Big Data Processing with Storm and Kafka on Alibaba Cloud

This article explains how to build a large‑scale, real‑time vehicle monitoring system using Apache Storm and Kafka on Alibaba Cloud, covering the challenges of big‑data ingestion, system architecture, deployment steps, performance testing, and practical lessons learned.

Alibaba CloudBig DataKafka

0 likes · 12 min read

Real-Time Big Data Processing with Storm and Kafka on Alibaba Cloud

Architect

Dec 18, 2015 · Big Data

Understanding Apache Kafka’s High‑Throughput Architecture and Performance Optimizations

This article explains Apache Kafka’s core concepts, high‑throughput design choices such as sequential I/O, PageCache, Sendfile, and partitioning, and provides practical performance tips and configuration recommendations for brokers, producers, and consumers in large‑scale data pipelines.

Big DataConsumerDistributed Messaging

0 likes · 16 min read

Understanding Apache Kafka’s High‑Throughput Architecture and Performance Optimizations

Qunar Tech Salon

Dec 15, 2015 · Big Data

Real-Time Computing with Apache Storm: Architecture, Code Samples, and Fault Tolerance

This article explains the principles of real-time computing, compares it with offline batch processing, and demonstrates a practical solution using Kafka for ingestion, Apache Storm for continuous computation, and various storage options, while also covering streaming concepts and Storm's high‑availability mechanisms.

Apache StormKafkaReal‑Time Computing

0 likes · 8 min read

Real-Time Computing with Apache Storm: Architecture, Code Samples, and Fault Tolerance

21CTO

Dec 14, 2015 · Backend Development

How Wacai Built a Scalable FinTech Architecture: 6 Key Design Strategies

Wacai’s architects outline six critical design decisions—including system layer separation, message passing, asynchronous processing, comprehensive data storage, robust security, and storage redundancy—that together enable a resilient, reactive financial platform capable of handling massive concurrent workloads.

AkkaFinTechKafka

0 likes · 8 min read

How Wacai Built a Scalable FinTech Architecture: 6 Key Design Strategies

Art of Distributed System Architecture Design

Nov 30, 2015 · Big Data

LinkedIn’s Kafka at Scale: Architecture, Optimizations, and Operational Practices

The article details how LinkedIn has scaled Kafka from handling billions to trillions of messages daily, describing quota enforcement, a ZooKeeper‑free consumer, reliability enhancements, security plans, monitoring frameworks, fault‑injection testing, cluster balancing, and integration with other internal data systems.

Big DataKafkaLinkedIn

0 likes · 12 min read

LinkedIn’s Kafka at Scale: Architecture, Optimizations, and Operational Practices

21CTO

Nov 21, 2015 · Big Data

Why Build a Kafka System? Core Use Cases and Design Principles

This article explains why Kafka is essential for activity and operational data pipelines, outlines key use cases such as news feeds, relevance ranking, security, monitoring, and reporting, and details its deployment topology, design decisions, and message persistence strategies.

Distributed MessagingKafkaReal-time Processing

0 likes · 14 min read

Why Build a Kafka System? Core Use Cases and Design Principles

21CTO

Nov 19, 2015 · Big Data

Beyond Hadoop: Modern Big Data Platforms and Technologies Explained

This article surveys the evolution of Hadoop and its ecosystem, explains core storage and processing concepts, and introduces contemporary big‑data technologies such as Spark, Flink, Kafka, Lambda architecture, NoSQL databases, and cloud‑native solutions, highlighting their roles and trade‑offs.

Big DataFlinkHadoop

0 likes · 17 min read

Beyond Hadoop: Modern Big Data Platforms and Technologies Explained

Efficient Ops

Oct 14, 2015 · Big Data

Spark vs Hadoop, Flink, HBase/Cassandra, Kafka & Tachyon: Expert Q&A

During a lively “Sit and Discuss” session, experts compared Spark and Hadoop, evaluated Flink against Spark, contrasted HBase with Cassandra, explained why Kafka (and sometimes Flink) is preferred for distributed messaging, and shared insights on Tachyon’s role in modern big‑data ecosystems.

FlinkHBaseHadoop

0 likes · 10 min read

Spark vs Hadoop, Flink, HBase/Cassandra, Kafka & Tachyon: Expert Q&A

21CTO

Sep 30, 2015 · Operations

How LinkedIn Scaled Kafka to Process Over 1 Trillion Messages Daily

Since 2011, LinkedIn has expanded its Kafka deployment from handling billions to over a trillion messages per day, focusing on quotas, a new ZooKeeper‑free consumer, reliability enhancements, security, monitoring frameworks, fault‑injection testing, cluster balancing, and ecosystem integrations, offering valuable lessons for large‑scale streaming systems.

KafkaLinkedInReliability

0 likes · 12 min read

How LinkedIn Scaled Kafka to Process Over 1 Trillion Messages Daily

21CTO

Sep 27, 2015 · Big Data

How Weidian Built a Scalable Big Data Platform for Mobile Commerce

This article outlines the design and implementation of Weidian’s end‑to‑end big data processing platform, covering dataset definition, data collection via Flume‑based DataAgent, transmission through Databus, storage options such as HDFS, Kafka and Elasticsearch, and the monitoring and resource‑integration strategies that support massive mobile commerce logs.

ElasticsearchFlumeHadoop

0 likes · 18 min read

How Weidian Built a Scalable Big Data Platform for Mobile Commerce

Art of Distributed System Architecture Design

Sep 23, 2015 · Big Data

Overview of Open-Source Real-Time Stream Processing Systems

This article provides a concise overview of several open‑source real‑time stream processing platforms—including S4, Storm, StreamBase, HStreaming, Esper/NEsper, Kafka, Scribe, and Flume—highlighting their main features, programming languages, and project links for further reference.

Big DataKafkaReal-Time

0 likes · 5 min read

Overview of Open-Source Real-Time Stream Processing Systems

Art of Distributed System Architecture Design

Sep 14, 2015 · Industry Insights

Why Kafka Dominates Distributed Messaging: Architecture, Features, and Best Practices

This article provides an in‑depth examination of Apache Kafka’s origins, design goals, core concepts such as brokers, topics, partitions, producers and consumers, compares it with other message queues, explains its storage format, configuration options, delivery guarantees, and includes practical Java code examples for partitioning and consumption.

Distributed SystemsKafkaMessage Queue

0 likes · 22 min read

Why Kafka Dominates Distributed Messaging: Architecture, Features, and Best Practices

21CTO

Aug 10, 2015 · Backend Development

How Kafka’s File Storage Mechanism Achieves High Performance

Kafka’s distributed log architecture stores messages in partitioned segments with indexed data files, enabling efficient sequential writes, rapid deletions, and fast offset-based lookups, as detailed through its broker, topic, partition, segment structures, file naming rules, and real‑world performance experiments.

Kafkafile storage

0 likes · 11 min read

How Kafka’s File Storage Mechanism Achieves High Performance

Qunar Tech Salon

Jul 8, 2015 · Big Data

Understanding Logs: The Foundation of Distributed Systems, Data Integration, and Stream Processing

This article explains how logs—simple, append‑only, time‑ordered records—serve as the core abstraction behind databases, distributed systems, data integration pipelines, and modern stream‑processing platforms such as Kafka and Hadoop, illustrating their design, scalability, and practical challenges.

Big DataData IntegrationDistributed Systems

0 likes · 45 min read

Understanding Logs: The Foundation of Distributed Systems, Data Integration, and Stream Processing

Architect

Jul 6, 2015 · Big Data

Understanding Logs: The Core of Distributed Systems and Data Integration

This article explains how logs—simple, append‑only, time‑ordered records—serve as the fundamental abstraction behind databases, distributed systems, data integration pipelines, and stream‑processing platforms like Kafka and Hadoop, illustrating their role in ordering, replication, scalability, and real‑time analytics.

Data IntegrationDistributed SystemsHadoop

0 likes · 48 min read

Understanding Logs: The Core of Distributed Systems and Data Integration

Art of Distributed System Architecture Design

Jun 15, 2015 · Big Data

Designing a Scalable Real‑Time Mobile Analytics Platform with Kafka, Storm, and Amazon EMR

The article describes how a mobile analytics service processes billions of events daily using a Lambda‑style architecture that combines Kafka, Storm, Amazon EMR, and S3 to achieve scalable, fault‑tolerant batch and real‑time computation, while ensuring reliable event ingestion and graceful degradation.

AWSBig DataKafka

0 likes · 8 min read

Designing a Scalable Real‑Time Mobile Analytics Platform with Kafka, Storm, and Amazon EMR

Art of Distributed System Architecture Design

Jun 1, 2015 · Big Data

Overview of Big Data Technologies and Architectures

This article provides a comprehensive overview of major big‑data platforms such as Hadoop, Spark, Flink, Kafka, and related ecosystem components, explaining their core concepts, storage models, processing frameworks, and architectural patterns for handling massive, distributed datasets.

HadoopKafkaNoSQL

0 likes · 18 min read

Overview of Big Data Technologies and Architectures

MaGe Linux Operations

Apr 28, 2015 · Big Data

How LinkedIn Scales Kafka to Billions of Messages Every Day

This article explains how LinkedIn uses Apache Kafka as a high‑throughput, fault‑tolerant messaging backbone, detailing its architecture, message categories, layered replication, audit mechanisms, and the engineering practices that keep billions of daily messages reliable and fast.

Big DataDistributed SystemsKafka

0 likes · 11 min read

How LinkedIn Scales Kafka to Billions of Messages Every Day

Art of Distributed System Architecture Design

Apr 28, 2015 · Big Data

Understanding Kafka High Availability: Data Replication and Leader Election

The article explains why Kafka introduced high availability starting with version 0.8, detailing the need for data replication and leader election, describing replica distribution algorithms, replication mechanics, ISR handling, ZooKeeper structures, and the broker failover process to ensure fault‑tolerant streaming.

KafkaZooKeeperhigh availability

0 likes · 19 min read

Understanding Kafka High Availability: Data Replication and Leader Election

Art of Distributed System Architecture Design

Apr 24, 2015 · Big Data

Pinterest Real-Time Data Pipeline Using Kafka, Spark, and MemSQL

Pinterest built a real‑time data pipeline that streams user engagement events through Apache Kafka into Spark Streaming, enriches them with location and category information, and persists the results in MemSQL to enable fast, SQL‑based analytics for its recommendation engine.

Big DataKafkaMemSQL

0 likes · 3 min read

Pinterest Real-Time Data Pipeline Using Kafka, Spark, and MemSQL

Art of Distributed System Architecture Design

Apr 12, 2015 · Industry Insights

Why Kafka Dominates Distributed Messaging: Architecture, Features, and Comparisons

This article provides a comprehensive overview of Apache Kafka, covering its origin, core design goals, key terminology, architectural components, message routing, consumer groups, delivery guarantees, and a detailed comparison with other popular message queue systems.

ConsumerDelivery GuaranteeDistributed Systems

0 likes · 22 min read

Why Kafka Dominates Distributed Messaging: Architecture, Features, and Comparisons

Nightwalker Tech

Mar 14, 2015 · Big Data

Log Collection and Analysis: Architectures Using Flume, Kafka, Storm, Elasticsearch, and MongoDB

This article discusses various log collection and analysis architectures, comparing solutions such as Flume‑Kafka‑Storm pipelines, Sentry, MongoDB, ELK stack, and Hadoop, and shares practical experiences, advantages, drawbacks, and deployment tips from multiple engineers.

Big DataFlumeKafka

0 likes · 7 min read

Log Collection and Analysis: Architectures Using Flume, Kafka, Storm, Elasticsearch, and MongoDB

Qunar Tech Salon

Mar 10, 2015 · Big Data

Kafka Overview: Architecture, Core Concepts, and Comparison with Other Message Queues

This article provides a comprehensive overview of Kafka, covering its background, design goals, architecture, key terminology, message routing, consumer groups, delivery guarantees, and a comparison with other popular message queue systems such as RabbitMQ, Redis, ZeroMQ, and ActiveMQ.

ConsumerKafkaMessage Queue

0 likes · 21 min read

Kafka Overview: Architecture, Core Concepts, and Comparison with Other Message Queues

Meituan Technology Team

Jan 14, 2015 · Big Data

Kafka File Storage Mechanism and Architecture

Kafka stores each topic as partitions that are divided into sequential segment files containing paired .log data and .index files, using global offsets and sparse memory‑mapped indexes to enable fast offset‑based lookups, efficient deletions, and minimal disk I/O in real‑world deployments.

KafkaMessage QueuePartition

0 likes · 9 min read

Kafka File Storage Mechanism and Architecture