Tagged articles
252 articles
Page 3 of 3
JD Tech
JD Tech
May 31, 2018 · Backend Development

Design and Architecture of a Unified MySQL Data Synchronization Platform

This article details the design of a unified MySQL data synchronization platform that consolidates offline sync, real‑time subscription, and real‑time sync into BatchJob, StreamJob, and PieJob abstractions, describing task implementations, cluster architecture, high‑availability mechanisms, and evolution challenges such as file loss and metadata handling.

Backend ArchitectureBatch Processingdata synchronization
0 likes · 10 min read
Design and Architecture of a Unified MySQL Data Synchronization Platform
Java High-Performance Architecture
Java High-Performance Architecture
May 22, 2018 · Big Data

Is Apache Kafka Right for You? Core Features, Stream Processing, and Use Cases

This article explains Apache Kafka’s evolution and adoption by Fortune‑500 firms, outlines its two core capabilities—messaging (queue and publish/subscribe) and stream processing via the Java Stream API—provides example code, typical use cases, and guidance on scenarios where Kafka may not be the best solution.

Apache KafkaUse Casesstream processing
0 likes · 5 min read
Is Apache Kafka Right for You? Core Features, Stream Processing, and Use Cases
Alibaba Cloud Developer
Alibaba Cloud Developer
May 21, 2018 · Databases

How TcpRT Enables Real‑Time Service Quality Monitoring for Massive Cloud Databases

TcpRT is a real‑time instrumentation and diagnostic system for Alibaba Cloud RDS that non‑intrusively collects TCP trace data, aggregates billions of records per day, applies statistical and Cauchy‑based anomaly detection, and pinpoints root causes across hosts, proxies, and network devices at massive scale.

Cloud DatabasesSIGMODanomaly detection
0 likes · 27 min read
How TcpRT Enables Real‑Time Service Quality Monitoring for Massive Cloud Databases
Architecture Digest
Architecture Digest
Mar 14, 2018 · Big Data

Attributes Matrix and Data Flow Models of Apache Streaming Platforms

This article presents a comprehensive attributes matrix and data‑flow model overview for major Apache streaming platforms, comparing versions, sponsors, event handling, fault tolerance, processing order, latency, resource management, APIs, and supported connectors to aid practical technology selection.

ApacheBig Dataattributes matrix
0 likes · 16 min read
Attributes Matrix and Data Flow Models of Apache Streaming Platforms
Meituan Technology Team
Meituan Technology Team
Jan 26, 2018 · Big Data

Design and Implementation of a Real-Time Data Processing System at Meituan

Meituan designed a Storm‑based real‑time data processing platform that guarantees at‑least‑once delivery and high availability, employs a custom spout, regression‑driven traffic smoothing, and a low‑latency KV store with atomic operations, persisting results in Kafka, MySQL and Cellar to power merchant dashboards and heat‑tag analytics, while planning broader real‑time analytics expansion.

Big DataDistributed SystemsStorm
0 likes · 10 min read
Design and Implementation of a Real-Time Data Processing System at Meituan
dbaplus Community
dbaplus Community
Dec 26, 2017 · Big Data

Turning Raw Logs into Structured Data with DBus Visual Rule Operators

This article explains how the open‑source DBus platform, combined with the Wormhole streaming engine, captures raw application logs, lets users configure visual rule operators, and transforms the unstructured message part into schema‑driven, Kafka‑ready data for downstream analytics.

Big DataDBusLog Processing
0 likes · 15 min read
Turning Raw Logs into Structured Data with DBus Visual Rule Operators
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Dec 22, 2017 · Big Data

Slipstream 5.1 Unveiled: New CEP, Session Windows & Event‑Driven Engine

Slipstream 5.1 expands its real‑time stream processing capabilities with richer Complex Event Processing syntax, introduces Session Window support for session‑based analytics, and enhances the Morphling event‑driven engine, all accessible via SQL, making advanced streaming applications easier for both developers and business users.

Real-time analyticscomplex event processingsession window
0 likes · 8 min read
Slipstream 5.1 Unveiled: New CEP, Session Windows & Event‑Driven Engine
Architecture Digest
Architecture Digest
Dec 16, 2017 · Big Data

Performance Comparison of Apache Flink and Apache Storm for Real‑Time Stream Processing

This report presents a systematic performance evaluation of Apache Flink and Apache Storm across multiple real‑time processing scenarios, measuring throughput, latency, message‑delivery semantics, and state‑backend effects, and provides recommendations for selecting the most suitable engine based on the observed results.

Big DataFlinkReal-time analytics
0 likes · 21 min read
Performance Comparison of Apache Flink and Apache Storm for Real‑Time Stream Processing
ITPUB
ITPUB
Nov 23, 2017 · Big Data

7 Typical Big Data Projects Every Hadoop Engineer Should Know

The article outlines seven common big‑data initiatives—data integration, specialized analytics, Hadoop‑as‑a‑service, stream processing, complex event handling, ETL pipelines, and SAS replacement—explaining their goals, typical technologies such as HDFS, Hive, Spark, Storm, Kafka, and practical considerations for enterprises adopting Hadoop ecosystems.

Data IntegrationHadoopproject types
0 likes · 8 min read
7 Typical Big Data Projects Every Hadoop Engineer Should Know
ITPUB
ITPUB
Nov 13, 2017 · Big Data

How Real‑Time Big Data Stream Computing Powers Double 11 E‑Commerce Success

The article explains how NetEase’s real‑time big‑data stream computing platform, Sloth, handles massive, continuously generated data during China’s Double 11 shopping festival, covering use cases, architectural shifts from batch to incremental processing, technical challenges, and the role of stream‑SQL for easier development.

Distributed SystemsReal‑Time ComputingSQL
0 likes · 16 min read
How Real‑Time Big Data Stream Computing Powers Double 11 E‑Commerce Success
ITPUB
ITPUB
Nov 13, 2017 · Big Data

How Real-Time Big Data Streaming Powers Double 11 E‑Commerce Success

The article explains how continuous data generation and real‑time stream processing enable e‑commerce platforms like NetEase Kaola to handle massive Double 11 traffic, showcasing use cases, architectural shifts from batch to incremental computing, and the technical challenges of latency, accuracy, and fault tolerance.

Distributed SystemsReal-time StreamingSQL
0 likes · 15 min read
How Real-Time Big Data Streaming Powers Double 11 E‑Commerce Success
dbaplus Community
dbaplus Community
Oct 15, 2017 · Big Data

How JD Built a Scalable Seller Log Platform with Kafka, Storm, ES & HBase

This article details JD's end‑to‑end seller log system architecture, explaining why Kafka, Storm, Elasticsearch and HBase were chosen, the challenges faced during scaling, and the practical solutions implemented to achieve a unified, high‑throughput logging platform for merchants and operations.

Big DataElasticsearchHBase
0 likes · 13 min read
How JD Built a Scalable Seller Log Platform with Kafka, Storm, ES & HBase
21CTO
21CTO
Aug 14, 2017 · Big Data

Unveiling Flink’s Multi‑Layer Execution Graph: From StreamGraph to Physical Deployment

This article explains Flink’s architecture, detailing the roles of Client, JobManager and TaskManager, walks through a SocketTextStreamWordCount example, and clarifies the four‑layer graph model—StreamGraph, JobGraph, ExecutionGraph, and the physical execution graph—highlighting why each layer exists.

Big DataExecution GraphFlink
0 likes · 9 min read
Unveiling Flink’s Multi‑Layer Execution Graph: From StreamGraph to Physical Deployment
Alibaba Cloud Developer
Alibaba Cloud Developer
May 25, 2017 · Big Data

How Alibaba’s Blink Engine Redefines Real‑Time Big Data Processing

This article explains how Alibaba’s Blink, built on Apache Flink, transforms batch‑oriented big‑data platforms into a unified, high‑performance real‑time computing engine, detailing its architecture, state management, checkpointing, and successful deployment in e‑commerce, search, recommendation, and online machine‑learning scenarios.

AlibabaBig DataFlink
0 likes · 17 min read
How Alibaba’s Blink Engine Redefines Real‑Time Big Data Processing
Suning Technology
Suning Technology
May 18, 2017 · Big Data

Why Apache Flink Beats Spark and Storm in Stream Processing

This article examines Apache Flink's stream‑processing architecture, compares its native streaming model, fault‑tolerance, performance and SQL capabilities with Spark and Storm, and concludes that Flink offers a more powerful and efficient solution despite some maturity gaps.

Apache FlinkSparkStorm
0 likes · 12 min read
Why Apache Flink Beats Spark and Storm in Stream Processing
Architecture Digest
Architecture Digest
May 18, 2017 · Backend Development

Design and Architecture of Ctrip's Real‑Time User Behavior Service

The article describes how Ctrip rebuilt its real‑time user behavior platform using a Java‑based stack (Kafka, Storm, Redis, MySQL) to achieve millisecond‑level latency, high availability, scalable performance, and robust handling of traffic spikes, failures, and data back‑pressure.

Backend ArchitectureKafkaReal-Time
0 likes · 12 min read
Design and Architecture of Ctrip's Real‑Time User Behavior Service
Architecture Digest
Architecture Digest
Feb 11, 2017 · Big Data

LeKe Sports Big Data Platform Evolution: From Early ETL Reporting to 2.0 Streaming Architecture

The article describes how LeKe Sports built and continuously upgraded its Hadoop‑based big data platform—from a manual ETL‑to‑Elasticsearch reporting system to a 2.0 architecture featuring Spark Streaming, SQL‑based query layers, Elasticsearch indexing, and cloud‑native storage and backup solutions—to meet rapidly growing PB‑scale data demands.

Big DataData PlatformETL
0 likes · 5 min read
LeKe Sports Big Data Platform Evolution: From Early ETL Reporting to 2.0 Streaming Architecture
21CTO
21CTO
Jan 18, 2017 · Big Data

Build a Lightweight, High‑Availability Real‑Time Stream Processing System

Learn how to construct a simple, high‑availability real‑time stream processing platform using lightweight components such as Kafka, Zookeeper, Thrift/Avro, and optional storage like MongoDB or Elasticsearch, offering a practical alternative to heavyweight frameworks like Storm and Spark Streaming for small‑to‑medium enterprises.

Big DataKafkaReal-Time
0 likes · 5 min read
Build a Lightweight, High‑Availability Real‑Time Stream Processing System
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 9, 2017 · Big Data

How Alibaba Scaled Real‑Time Data Processing for Double 11: Architecture & Lessons

This article details Alibaba's real‑time computing architecture for the 2016 Double 11 event, covering background, core components such as DRC, TT, Galaxy, OTS, XTool and OneService, and explains optimization techniques, fault‑tolerance strategies, stress‑testing practices, and future upgrade plans to handle massive streaming data workloads.

Big DataPerformance OptimizationReal‑Time Computing
0 likes · 14 min read
How Alibaba Scaled Real‑Time Data Processing for Double 11: Architecture & Lessons
Ctrip Technology
Ctrip Technology
Aug 12, 2016 · Big Data

Ctrip's Real-Time Data Platform: Architecture, Practices, and Lessons Learned

This article details Ctrip's journey building a unified real-time data platform—covering business motivations, architectural requirements, technology choices like Kafka and Storm, implementation of Avro schemas, monitoring, alerting, operational lessons, and future explorations such as Streaming CQL and JStorm.

AlertingBig DataKafka
0 likes · 15 min read
Ctrip's Real-Time Data Platform: Architecture, Practices, and Lessons Learned
Architect
Architect
Jul 14, 2016 · Big Data

Understanding Custom Stream IDs and Topology Building in Apache Storm

This article explains how to construct Apache Storm topologies with custom stream IDs, demonstrates the classic WordCountTopology example, and provides detailed Java code snippets illustrating spout and bolt configurations, stream declarations, and grouping strategies for real‑time stream processing.

Apache StormBig DataCustom Stream ID
0 likes · 8 min read
Understanding Custom Stream IDs and Topology Building in Apache Storm
Big Data and Microservices
Big Data and Microservices
Apr 19, 2016 · Industry Insights

Designing a Scalable Real‑Time Stock Prediction Architecture with Open‑Source Tools

This article outlines a reference architecture for a low‑latency, horizontally scalable real‑time stock prediction system built with open‑source components such as Spring Cloud Data Flow, Apache Geode, Spark MLlib, and Hadoop, and discusses data flow steps, simplified deployment, and algorithm choices for market forecasting.

Big DataReal-TimeStock Prediction
0 likes · 7 min read
Designing a Scalable Real‑Time Stock Prediction Architecture with Open‑Source Tools
Architect
Architect
Mar 29, 2016 · Big Data

Understanding Apache Storm Architecture, Stream Groupings, and the Acker Mechanism

This article provides a comprehensive overview of Apache Storm’s architecture, including the roles of Nimbus, Supervisor, and ZooKeeper, explains various stream groupings, details the Acker mechanism, and describes task execution, parallelism calculation, and internal data flow within the Storm cluster.

Apache StormBig DataReal-time analytics
0 likes · 19 min read
Understanding Apache Storm Architecture, Stream Groupings, and the Acker Mechanism
Qunar Tech Salon
Qunar Tech Salon
Feb 24, 2016 · Artificial Intelligence

Overview and Architecture of Pora: A Real‑Time Personalization Analytics Platform

The article introduces Pora, a real‑time offline‑realtime analytics system for personalized search that combines high‑throughput stream processing, low‑latency computation, online learning algorithms, and a modular architecture to support continuous 24/7 operation and large‑scale performance optimizations.

AIOnline LearningReal-time analytics
0 likes · 6 min read
Overview and Architecture of Pora: A Real‑Time Personalization Analytics Platform
21CTO
21CTO
Jan 25, 2016 · Big Data

How Alibaba’s Pora Powers Real‑Time Personalization at Massive Scale

Pora (Personal Offline Realtime Analyze) is a high‑throughput, low‑latency platform that captures user behavior in real time, enabling Alibaba’s search engine to deliver personalized results, support online learning, and run 24/7 with massive data volumes.

AlibabaBig DataPora
0 likes · 6 min read
How Alibaba’s Pora Powers Real‑Time Personalization at Massive Scale
Efficient Ops
Efficient Ops
Jan 5, 2016 · Information Security

How Apache Eagle Secures Hadoop: Real‑Time Big Data Threat Detection

Apache Eagle is an open‑source, distributed, real‑time security monitoring platform for Hadoop that combines stream‑processing, scalable policy enforcement, and machine‑learning user profiling to protect massive data assets across eBay’s production clusters.

Apache EagleBig DataHadoop
0 likes · 19 min read
How Apache Eagle Secures Hadoop: Real‑Time Big Data Threat Detection
Qunar Tech Salon
Qunar Tech Salon
Dec 15, 2015 · Big Data

Real-Time Computing with Apache Storm: Architecture, Code Samples, and Fault Tolerance

This article explains the principles of real-time computing, compares it with offline batch processing, and demonstrates a practical solution using Kafka for ingestion, Apache Storm for continuous computation, and various storage options, while also covering streaming concepts and Storm's high‑availability mechanisms.

Apache StormKafkaReal‑Time Computing
0 likes · 8 min read
Real-Time Computing with Apache Storm: Architecture, Code Samples, and Fault Tolerance
Efficient Ops
Efficient Ops
Nov 26, 2015 · Big Data

Expert Insights on User Profiling and Stream Processing in Big Data

This article presents expert Q&A on effective user behavior analysis techniques for building detailed user profiles and compares mainstream stream‑processing solutions, outlining key factors such as latency, throughput, parallelism, and fault tolerance for selecting the right real‑time data platform.

Big Datastream processinguser profiling
0 likes · 11 min read
Expert Insights on User Profiling and Stream Processing in Big Data
21CTO
21CTO
Nov 23, 2015 · Big Data

How Dianping Scales Real‑Time Analytics with Apache Storm

This article explains how Dianping built a millisecond‑level real‑time computation platform using Apache Storm, covering use cases, system architecture, core Storm concepts, performance tuning, best practices, and a detailed Q&A on their production deployment.

Apache StormBig DataReal-time analytics
0 likes · 23 min read
How Dianping Scales Real‑Time Analytics with Apache Storm

Understanding Storm: A Distributed Real-Time Computation System

The article explains the need for low‑latency, high‑performance, distributed real‑time processing, outlines the challenges such systems must address, and introduces Storm as a Hadoop‑like framework for stream processing, detailing its architecture, fault‑tolerance mechanisms, transactional topology, and large‑scale deployment at Taobao.

Big DataDistributed SystemsReal-time Processing
0 likes · 14 min read
Understanding Storm: A Distributed Real-Time Computation System
21CTO
21CTO
Sep 24, 2015 · Big Data

Comparing Apache Storm, Spark, and Samza: Which Real‑Time Stream Processor Fits Your Needs?

Apache Storm, Spark Streaming, and Samza are three open‑source, low‑latency, scalable distributed systems for real‑time data processing; this article outlines their architectures, key concepts, differences in data handling, state management, delivery guarantees, and typical use‑cases to help you choose the right framework.

Apache SamzaApache StormBig Data
0 likes · 7 min read
Comparing Apache Storm, Spark, and Samza: Which Real‑Time Stream Processor Fits Your Needs?

Comparative Overview of Apache Storm, Spark Streaming, and Samza for Real-Time Data Processing

This article introduces Apache Storm, Spark Streaming, and Samza, explains their architectures, common features, key differences such as delivery guarantees and state management, and provides guidance on selecting the most suitable framework for various real‑time big‑data use cases.

Apache StormBig DataComparison
0 likes · 8 min read
Comparative Overview of Apache Storm, Spark Streaming, and Samza for Real-Time Data Processing
Qunar Tech Salon
Qunar Tech Salon
Jul 8, 2015 · Big Data

Understanding Logs: The Foundation of Distributed Systems, Data Integration, and Stream Processing

This article explains how logs—simple, append‑only, time‑ordered records—serve as the core abstraction behind databases, distributed systems, data integration pipelines, and modern stream‑processing platforms such as Kafka and Hadoop, illustrating their design, scalability, and practical challenges.

Big DataData IntegrationDistributed Systems
0 likes · 45 min read
Understanding Logs: The Foundation of Distributed Systems, Data Integration, and Stream Processing
Architect
Architect
Jul 6, 2015 · Big Data

Understanding Logs: The Core of Distributed Systems and Data Integration

This article explains how logs—simple, append‑only, time‑ordered records—serve as the fundamental abstraction behind databases, distributed systems, data integration pipelines, and stream‑processing platforms like Kafka and Hadoop, illustrating their role in ordering, replication, scalability, and real‑time analytics.

Data IntegrationDistributed SystemsHadoop
0 likes · 48 min read
Understanding Logs: The Core of Distributed Systems and Data Integration

Storm vs Spark: Which Real‑Time Analytics Platform Wins for Your Business?

The article compares Apache Storm and Apache Spark, examining their origins, architecture, language support, integration capabilities, and performance characteristics, and offers guidance on selecting the right platform for real‑time business intelligence based on specific workload and infrastructure needs.

Apache SparkApache StormBig Data
0 likes · 11 min read
Storm vs Spark: Which Real‑Time Analytics Platform Wins for Your Business?
High Availability Architecture
High Availability Architecture
May 15, 2015 · Big Data

Real-Time Computing at Dianping: Architecture, Use Cases, and Best Practices

During a detailed live session, senior Dianping engineer Wang Xinchun explains the company's real‑time computing platform built on Apache Storm, covering use cases such as dashboards, search and recommendation, system architecture, data ingestion tools like Blackhole and Puma, performance tuning, monitoring, and practical best‑practice recommendations.

Apache StormBig DataReal‑Time Computing
0 likes · 21 min read
Real-Time Computing at Dianping: Architecture, Use Cases, and Best Practices

Understanding Stream Processing, Event Sourcing, and Complex Event Processing

The article explains the fundamentals of stream processing, event sourcing, and complex event processing, comparing raw event storage with aggregated results, illustrating architectures with Kafka, Samza, and other frameworks, and highlighting benefits such as scalability, flexibility, and decoupling for modern data‑driven systems.

Apache KafkaApache SamzaBig Data
0 likes · 11 min read
Understanding Stream Processing, Event Sourcing, and Complex Event Processing