Tagged articles
181 articles
Page 2 of 2
ITFLY8 Architecture Home
ITFLY8 Architecture Home
May 3, 2021 · Big Data

Unlocking the Power of Data Middle Platforms: Key Concepts and Best Practices

This article provides a comprehensive overview of data middle platforms, covering data aggregation, collection tools, offline and real‑time development, scheduling, baseline control, heterogeneous storage, data governance, service layers, monitoring, and the architectural differences between offline and real‑time data warehouses.

Data WarehouseETLReal-time Processing
0 likes · 26 min read
Unlocking the Power of Data Middle Platforms: Key Concepts and Best Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 23, 2021 · Big Data

Practical Implementations of Data Lakes: Huawei Production Scenario, Real-Time Financial Data Lake, and Soul's Delta Lake

This article presents a comprehensive overview of data lake implementations, detailing Huawei's production‑scene platform, a real‑time financial data lake architecture using Kafka, Flink and Iceberg, and Soul's Delta Lake practice with Spark, Hive, and custom ETL tools, highlighting design choices, processing flows, and operational considerations.

Data LakeDelta LakeFlink
0 likes · 8 min read
Practical Implementations of Data Lakes: Huawei Production Scenario, Real-Time Financial Data Lake, and Soul's Delta Lake
Meituan Technology Team
Meituan Technology Team
Jan 14, 2021 · Big Data

Design and Implementation of an SSD‑Based Application‑Layer Cache Architecture for Kafka in Meituan Data Platform

Meituan built an SSD‑based application‑layer cache for Kafka that bypasses PageCache contention between real‑time and delayed jobs, classifies log segments across SSD and HDD, limits flush rates, and achieves up to 80% latency reduction while guaranteeing stable real‑time consumption.

Big DataKafkaLogSegment
0 likes · 19 min read
Design and Implementation of an SSD‑Based Application‑Layer Cache Architecture for Kafka in Meituan Data Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 16, 2020 · Big Data

Designing a Real‑Time Data Processing Platform with Flink: Architecture, Deployment, and Operations

This article explains how to build a real‑time data processing platform using Flink, covering the Lambda architecture, design approaches, SQL and custom‑Jar task definitions, UI drag‑and‑drop, cluster resource management on Yarn and Kubernetes, submission modes, scheduling, permission and metadata handling, logging, and monitoring with Prometheus and Grafana.

Cluster ManagementFlinkLambda architecture
0 likes · 19 min read
Designing a Real‑Time Data Processing Platform with Flink: Architecture, Deployment, and Operations
Big Data Technology Architecture
Big Data Technology Architecture
Nov 23, 2020 · Big Data

One‑Stop Data Lake Ingestion Solution with Alibaba Cloud Data Lake Formation (DLF)

The article describes Alibaba Cloud's Data Lake Formation service, presenting a unified, real‑time, and low‑latency solution for ingesting heterogeneous data sources—including RDS, DTS, TableStore, and SLS—into an OSS‑backed data lake using templates, a Spark‑based ingestion engine, and modern file formats such as Delta Lake.

Alibaba CloudDelta LakeReal-time Processing
0 likes · 10 min read
One‑Stop Data Lake Ingestion Solution with Alibaba Cloud Data Lake Formation (DLF)
Xianyu Technology
Xianyu Technology
Nov 17, 2020 · Big Data

Xianyu Premium Product Library: Architecture and Implementation

Xianyu’s premium‑product library combines interpretable, multi‑dimensional metric models built from structured product and user attributes with real‑time and offline pipelines to systematically tag high‑quality items, delivering services via HSF and a message bus, and has driven over 20% click‑through growth and nearly doubled conversion rates.

Real-time Processingdata pipelinefeature engineering
0 likes · 7 min read
Xianyu Premium Product Library: Architecture and Implementation
Baidu App Technology
Baidu App Technology
Sep 7, 2020 · Artificial Intelligence

Real-Time Mobile Super-Resolution Reconstruction in Baidu App

The article describes Baidu App's real-time mobile super-resolution using a VDSR-based model with pruning and depthwise separable convolutions, optimized via application-layer and inference engine techniques to halve latency and memory, enabling on-device high‑def image/video enhancement, reducing server load, and supporting iOS/Android integration.

Mobile AIReal-time Processingimage enhancement
0 likes · 8 min read
Real-Time Mobile Super-Resolution Reconstruction in Baidu App
58 Tech
58 Tech
Aug 24, 2020 · Big Data

Design and Practice of an Online Real-Time Feature System for Intelligent Risk Control

This article presents the concepts, architecture, and practical techniques of an online real‑time feature system used in intelligent risk‑control, covering feature definition, time‑window types, calculation functions, distributed processing, low‑latency storage, and operational challenges in high‑concurrency environments.

Big DataReal-time ProcessingStreaming
0 likes · 16 min read
Design and Practice of an Online Real-Time Feature System for Intelligent Risk Control
Java Architect Essentials
Java Architect Essentials
Aug 21, 2020 · Big Data

Design and Integration of Flume, Kafka, Storm, Drools, and Redis for Real‑Time ETL Log Analysis

This article presents a modular architecture for real‑time ETL log analysis that combines Flume for log collection, Kafka as a buffering layer, Storm for stream processing, Drools for rule‑based data transformation, and Redis for fast storage, detailing installation, configuration, and code integration steps.

Big DataDroolsFlume
0 likes · 23 min read
Design and Integration of Flume, Kafka, Storm, Drools, and Redis for Real‑Time ETL Log Analysis
DataFunTalk
DataFunTalk
Aug 21, 2020 · Big Data

Design and Implementation of 58.com Commercial DMP Platform

This talk presents the architecture, feature extraction, storage, real-time computation, monitoring, and optimization strategies of 58.com’s commercial DMP platform, detailing business requirements, system design across data, storage, compute, and service layers, and future plans for unified services and advanced analytics.

DMPData PlatformReal-time Processing
0 likes · 13 min read
Design and Implementation of 58.com Commercial DMP Platform
Architects' Tech Alliance
Architects' Tech Alliance
Aug 11, 2020 · Big Data

Comprehensive Overview of Data Middle Platform Architecture, Components, and Practices

This article provides an extensive summary of data middle platform concepts, covering data aggregation, collection tools, offline and real‑time development, data governance, service layers, warehouse construction, and operational practices, illustrating how enterprises build and manage a unified data ecosystem.

Big DataData GovernanceData Middle Platform
0 likes · 27 min read
Comprehensive Overview of Data Middle Platform Architecture, Components, and Practices
DataFunTalk
DataFunTalk
Aug 4, 2020 · Artificial Intelligence

Weibo Machine Learning Platform (WML) Overview and Flink Applications

This article presents an in‑depth overview of Weibo's large‑scale machine learning platform, detailing its multi‑layer architecture, development workflow, CTR model evolution, and how Apache Flink is employed for real‑time data processing, sample services, multi‑stream joins, multimedia feature generation, and future roadmap plans.

CTRData PlatformFlink
0 likes · 12 min read
Weibo Machine Learning Platform (WML) Overview and Flink Applications
DataFunTalk
DataFunTalk
Feb 17, 2020 · Artificial Intelligence

Building a Closed‑Loop AI System: From Data Collection to Model Deployment in Alibaba’s XiaoMi

This article explains how Alibaba’s XiaoMi team constructs a full‑cycle AI pipeline—covering real‑time and offline data processing, high‑dimensional visualization, model training, iterative feedback, and Spark‑based deployment—to accelerate intelligent product iteration while addressing common engineering pain points.

AIBig DataReal-time Processing
0 likes · 10 min read
Building a Closed‑Loop AI System: From Data Collection to Model Deployment in Alibaba’s XiaoMi
Youku Technology
Youku Technology
Feb 13, 2020 · Artificial Intelligence

AI-Based Follow-Subtitle (Bullet) System for Video Streaming

The article presents an AI‑driven follow‑subtitle system for video streaming that uses server‑side face detection and tracking to attach speech‑bubble bullets to characters, synchronizing trajectories with playback via a client SDK, while addressing cut‑scene handling, latency, and power constraints.

AIReal-time Processingalgorithm
0 likes · 8 min read
AI-Based Follow-Subtitle (Bullet) System for Video Streaming
ITPUB
ITPUB
Jan 10, 2020 · Big Data

How MaFengWo Scales Kafka for Real‑Time Big Data: Lessons and Best Practices

This article details MaFengWo’s practical experience using Kafka across three core scenarios—real‑time storage, analytical data source, and business data subscription—while describing a four‑stage evolution that includes version upgrades, resource isolation, security and monitoring enhancements, and a comprehensive subscription platform, followed by future improvement plans.

Big DataData ReplayKafka
0 likes · 16 min read
How MaFengWo Scales Kafka for Real‑Time Big Data: Lessons and Best Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 28, 2019 · Big Data

Big Data Technology and Architecture: Leveraging Spark and HBase for Real‑Time and Offline Processing

This article outlines the challenges of various big‑data scenarios such as financial risk control, recommendation systems, and social feeds, explains why Spark is chosen over alternatives, describes a one‑stop data platform architecture with Spark‑HBase integration, and shares best‑practice tips and case studies.

Big DataData ArchitectureHBase
0 likes · 7 min read
Big Data Technology and Architecture: Leveraging Spark and HBase for Real‑Time and Offline Processing
AntTech
AntTech
Oct 28, 2019 · Databases

Ant Financial's Online Graph Computing: Architecture, Applications, and Core Technologies

This article explains Ant Financial's online graph computing technology, covering its financial‑grade graph database, real‑time anti‑cashout use cases, high‑performance graph cache, flow‑graph fusion, dynamic DAG execution, and how these innovations support massive, low‑latency financial services.

Real-time Processingfinancial technologyhigh-performance cache
0 likes · 13 min read
Ant Financial's Online Graph Computing: Architecture, Applications, and Core Technologies
Beike Product & Technology
Beike Product & Technology
Sep 20, 2019 · Big Data

Understanding DStream Construction and Execution in Spark Streaming

This article explains how Spark Streaming's DStream abstraction is built from InputDStream through successive transform operators, details the internal ForEachDStream implementation, describes the job generation and scheduling workflow, and outlines how Beike's real‑time platform leverages these mechanisms for large‑scale streaming tasks.

Big DataDstreamReal-time Processing
0 likes · 10 min read
Understanding DStream Construction and Execution in Spark Streaming
dbaplus Community
dbaplus Community
Aug 27, 2019 · Big Data

How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming

This article explains how eBay’s Sherlock.IO monitoring platform processes billions of logs, events, and metrics daily using Flink Streaming jobs, detailing a metadata‑driven architecture, shared job strategies, Heartbeat‑based monitoring, job isolation, back‑pressure handling, and real‑world use cases such as Event Alerting, Eventzon, and Netmon.

Big DataFlinkReal-time Processing
0 likes · 18 min read
How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 12, 2019 · Big Data

Designing a Real‑Time Big Data Sentiment System on Alibaba Cloud: From Lambda to Lambda‑Plus

This article explains how massive online data can be captured, structured, and analyzed in real time using a Lambda‑style architecture, then introduces a simplified Lambda‑Plus design built on Alibaba Cloud's Tablestore and Blink to meet both batch and streaming requirements while reducing operational complexity.

Big DataLambda architectureReal-time Processing
0 likes · 18 min read
Designing a Real‑Time Big Data Sentiment System on Alibaba Cloud: From Lambda to Lambda‑Plus
Big Data Technology Architecture
Big Data Technology Architecture
Jul 12, 2019 · Big Data

Why Kafka Is So Popular: Features, Use Cases, and Architecture Overview

This article explains why Apache Kafka has become a cornerstone of modern big‑data pipelines by detailing its high‑throughput, fault‑tolerant publish‑subscribe architecture, real‑time processing capabilities, extensive language support, scalability mechanisms, and the wide range of use cases adopted by leading enterprises.

Distributed StreamingKafkaMessage Queue
0 likes · 9 min read
Why Kafka Is So Popular: Features, Use Cases, and Architecture Overview
21CTO
21CTO
Jun 7, 2019 · Big Data

How to Build a Real-Time Big Data Sentiment Analysis Platform Using Lambda & Kappa

This article explores the design of a large‑scale, real‑time sentiment analysis system, detailing the data ingestion, processing, and storage requirements, comparing Lambda and Kappa architectures, and presenting an Alibaba Cloud solution that combines Tablestore and Blink for unified batch‑and‑stream processing.

Big DataKappa architectureLambda architecture
0 likes · 18 min read
How to Build a Real-Time Big Data Sentiment Analysis Platform Using Lambda & Kappa
Youzan Coder
Youzan Coder
Apr 29, 2019 · Big Data

Optimizing Flink Sliding Windows for Super Long Time Ranges

To overcome severe performance degradation of Flink sliding windows over very long time ranges, Youzan engineers applied time‑slicing based on the greatest common divisor of window length and slide step, reducing state writes and timers, which yielded 3‑8× speedups in production.

Big DataFlinkReal-time Processing
0 likes · 18 min read
Optimizing Flink Sliding Windows for Super Long Time Ranges
Qunar Tech Salon
Qunar Tech Salon
Feb 20, 2019 · Big Data

Building Real-Time User Behavior Engineering with Apache Flink: Architecture, Features, and Implementation

This article introduces the design and implementation of a real‑time user behavior engineering platform at Qunar using Apache Flink, covering Flink's core characteristics, distributed runtime, DataStream programming model, fault‑tolerance, back‑pressure handling, event‑time processing, windowing, watermarks, and practical code examples for filtering, splitting, joining, and state management.

CheckpointDataStreamEventTime
0 likes · 18 min read
Building Real-Time User Behavior Engineering with Apache Flink: Architecture, Features, and Implementation
Youzan Coder
Youzan Coder
Jan 11, 2019 · Backend Development

Business Reconciliation Platform Architecture Design for Distributed Systems

The article describes YouZan's business reconciliation platform for distributed systems, which detects and quantifies data inconsistencies by offering easy plug‑in integration, a four‑step orchestrated workflow, high‑throughput offline processing with Spark, second‑level real‑time event handling, a three‑layer architecture, and health monitoring for transaction chains.

CAP theoremData ConsistencyDistributed Systems
0 likes · 9 min read
Business Reconciliation Platform Architecture Design for Distributed Systems
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 20, 2018 · Big Data

A Decade of Alibaba's Big Data Platform Evolution Through Double 11

The article chronicles Alibaba's ten‑year journey of building and scaling its big data platform—from early Oracle clusters and Hadoop‑based Cloud‑Ladder 1 to the self‑developed ODPS/MaxCompute, real‑time Blink engine, and the unified DataWorks ecosystem—highlighting key technical milestones, performance breakthroughs, and operational challenges that powered successive Double 11 shopping festivals.

AlibabaData PlatformMaxCompute
0 likes · 22 min read
A Decade of Alibaba's Big Data Platform Evolution Through Double 11
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 15, 2018 · Big Data

How Alibaba Built a World‑Class Big Data Platform Over a Decade

This article chronicles Alibaba's ten‑year journey of building and scaling its big‑data platform—from early Oracle clusters and Hadoop, through the launch of ODPS and MaxCompute, to global cloud expansion and cutting‑edge streaming innovations that now power billions of transactions each Double‑11.

AlibabaData PlatformMaxCompute
0 likes · 23 min read
How Alibaba Built a World‑Class Big Data Platform Over a Decade
dbaplus Community
dbaplus Community
Nov 1, 2018 · Big Data

How Vipshop Scales Real‑Time Data with Flink on Kubernetes

This article details Vipshop's real‑time platform architecture, the migration from Storm and Spark to Flink, Flink's deployment on Kubernetes, and the latest Unified Data Management system that unifies data access across Kafka, Redis, Tair and HDFS.

Big DataFlinkKubernetes
0 likes · 12 min read
How Vipshop Scales Real‑Time Data with Flink on Kubernetes
Meitu Technology
Meitu Technology
Aug 2, 2018 · Big Data

Spark Streaming vs Flink – Architecture, Scheduling & Fault Tolerance

This article compares Spark Streaming and Flink across runtime models, component roles, programming APIs, task scheduling, time semantics, dynamic Kafka partition detection, fault‑tolerance mechanisms, exactly‑once guarantees, and back‑pressure handling, providing code examples and practical insights for real‑time data processing.

Dynamic Partition DetectionExactly-OnceFlink
0 likes · 23 min read
Spark Streaming vs Flink – Architecture, Scheduling & Fault Tolerance
360 Tech Engineering
360 Tech Engineering
Jul 13, 2018 · Big Data

Titan 2.0 Big Data Processing Platform: Architecture Evolution and Practice

The article describes the evolution of 360's Titan big‑data processing platform through three architectural stages, details its functional modules, explains the DITTO component framework, context and rule‑engine abstractions, and shares practical case studies and personal insights on building a flexible, self‑service data platform.

Big DataDITTOETL
0 likes · 12 min read
Titan 2.0 Big Data Processing Platform: Architecture Evolution and Practice
Meitu Technology
Meitu Technology
Jun 25, 2018 · Artificial Intelligence

Meitu Short Video Real-Time Classification Challenge MTSVRC

The Meitu Short Video Real-Time Classification Challenge (MTSVRC), co‑hosted by the PRCV conference, Meitu and the Chinese Academy of Sciences, releases the industry‑largest dataset of over 100,000 five‑to‑15‑second videos in 50 categories, judging both classification accuracy and real‑time speed, and offers cash prizes up to ¥100,000 plus presentation opportunities for top teams.

AI researchMTSVRCPRCV 2018
0 likes · 5 min read
Meitu Short Video Real-Time Classification Challenge MTSVRC
Ctrip Technology
Ctrip Technology
Jun 4, 2018 · Big Data

Real-Time Data Processing Frameworks and Kafka Practices at Ctrip Ticketing

This article examines Ctrip Ticket's real-time data processing ecosystem, comparing batch and streaming frameworks such as Hadoop, Spark, Storm, Flink, and Spark Streaming, detailing Kafka deployment and configuration, and describing how these technologies are applied in production for log analysis, seat‑occupancy detection, and anti‑crawling.

FlinkReal-time ProcessingSpark Streaming
0 likes · 12 min read
Real-Time Data Processing Frameworks and Kafka Practices at Ctrip Ticketing
JD Retail Technology
JD Retail Technology
May 30, 2018 · Artificial Intelligence

Quick Q&A: Insights from JD JDATA Algorithm Competition

This article presents a rapid Q&A session with JD data scientists and architects, covering the benefits of algorithm contests for students, the unique advantages of the JDATA competition, scoring formulas, ways to improve results, strong feature extraction, real‑time modeling, algorithm selection, and the value of the competition’s special offer for future employment.

Data ScienceReal-time Processingalgorithm competition
0 likes · 8 min read
Quick Q&A: Insights from JD JDATA Algorithm Competition
Java Backend Technology
Java Backend Technology
Apr 20, 2018 · Artificial Intelligence

How Do Modern Recommendation Systems Balance Accuracy, Diversity, and Surprise?

This article explains the objectives, methods, architecture, and key algorithms of modern recommendation systems, covering popular, manual, related, and personalized approaches, the data pipeline, real‑time challenges, cold‑start handling, diversity, content quality, and exploration‑exploitation strategies.

Real-time ProcessingRecommendation Systemscollaborative filtering
0 likes · 15 min read
How Do Modern Recommendation Systems Balance Accuracy, Diversity, and Surprise?
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Apr 17, 2018 · Big Data

How a Big Data Platform Powers Real‑Time Facial Recognition for Billion‑Scale Face Libraries

This case study details how Beijing 恒远华信息技术有限公司 built a dynamic face‑capture and real‑time recognition solution on Huawei FusionInsight HD, leveraging deep‑learning algorithms, distributed storage, and stream processing to handle hundreds of millions of faces with high speed, efficiency, and security.

Apache StormHBaseHuawei FusionInsight
0 likes · 17 min read
How a Big Data Platform Powers Real‑Time Facial Recognition for Billion‑Scale Face Libraries
Beike Product & Technology
Beike Product & Technology
Mar 9, 2018 · Big Data

How Lianjia Built a Low‑Latency Real‑Time Data Platform with Spark Streaming

This article details Lianjia's journey of designing and implementing a low‑latency, stable real‑time computing platform using Spark Streaming on YARN, covering technical selection, architecture components, version compatibility challenges, exactly‑once semantics, graceful shutdown, Kafka tuning, and future enhancements.

Big DataExactly-OnceKafka
0 likes · 11 min read
How Lianjia Built a Low‑Latency Real‑Time Data Platform with Spark Streaming
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 1, 2018 · Artificial Intelligence

How Alibaba’s Graph‑Based Bundle Mining Doubles Conversion in E‑Commerce

Alibaba’s latest bundle‑mining system leverages weighted graph embedding and real‑time sampling to recommend complementary products, replacing traditional item‑to‑item similarity, boosting click‑through rates by up to 13% offline and 4% online during the Double‑11 promotion while handling billions of edges.

Real-time Processingbundle mininge‑commerce
0 likes · 12 min read
How Alibaba’s Graph‑Based Bundle Mining Doubles Conversion in E‑Commerce
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Jan 5, 2018 · Big Data

What Drove Big Data’s 2017 Surge and What’s Next? Insights & Predictions

Analyzing 2017’s big data boom, the article explores how the 4V characteristics—volume, variety, velocity, and value—spurred innovations like distributed storage, NoSQL, real‑time stream processing, and AI integration, and predicts future hotspots such as SQL resurgence, cloud‑based platforms, and AI‑driven analytics.

Big DataReal-time Processingartificial intelligence
0 likes · 11 min read
What Drove Big Data’s 2017 Surge and What’s Next? Insights & Predictions
dbaplus Community
dbaplus Community
Oct 30, 2017 · Big Data

How to Build a Real‑Time Spam Monitoring System with Apache Storm

This article walks through the design, deployment, and code implementation of a real‑time spam detection pipeline using Apache Storm, comparing it with Hadoop, detailing cluster setup, topology components, data flow, and how to package and run the solution on a distributed Storm cluster.

Apache StormBig DataHibernate
0 likes · 13 min read
How to Build a Real‑Time Spam Monitoring System with Apache Storm
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Sep 29, 2017 · Big Data

Evolution of Monitoring Architecture and Traffic Alert Algorithms at Tongcheng Travel

This article describes how Tongcheng Travel’s monitoring system evolved from a monolithic design to a distributed and big‑data‑based architecture, introducing real‑time processing with Storm, machine‑learning‑enhanced alerts, and a multivariate linear regression model that dramatically improves traffic anomaly detection accuracy.

Big DataReal-time Processingarchitecture evolution
0 likes · 10 min read
Evolution of Monitoring Architecture and Traffic Alert Algorithms at Tongcheng Travel
Meituan Technology Team
Meituan Technology Team
Sep 21, 2017 · Big Data

Feature Production Scheduling: Architecture Evolution and Core Technologies

Using Meituan‑Dianping’s hospitality online feature system as a case study, the article describes how feature production scheduling evolved from offline batch ETL to automated, metadata‑driven pipelines and sub‑second streaming, detailing the underlying architecture, incremental updates, storage abstraction, write‑shaving, atomicity, and recovery mechanisms.

Big DataReal-time ProcessingSystem Architecture
0 likes · 23 min read
Feature Production Scheduling: Architecture Evolution and Core Technologies
21CTO
21CTO
Aug 18, 2017 · Big Data

How Ctrip Builds a Scalable User Profile Platform for Personalized Travel

This article explains why Ctrip creates user profiles, describes the product and technical architectures, and details the data collection, computation, storage, high‑availability querying, and monitoring components that power its personalized travel recommendations and services.

CtripReal-time ProcessingSystem Architecture
0 likes · 8 min read
How Ctrip Builds a Scalable User Profile Platform for Personalized Travel
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jul 17, 2017 · Big Data

Mastering Data Sync, Real‑Time Processing, and Scalable Storage for Modern Systems

This article explores practical techniques for synchronizing heterogeneous data sources, performing batch and incremental analytics with Hadoop and Spark, designing low‑latency real‑time computation pipelines, implementing push notifications, and choosing appropriate storage solutions—from in‑memory caches to distributed databases—while addressing performance, reliability, and scalability challenges.

Big DataDistributed SystemsReal-time Processing
0 likes · 25 min read
Mastering Data Sync, Real‑Time Processing, and Scalable Storage for Modern Systems
dbaplus Community
dbaplus Community
Jul 10, 2017 · Big Data

Master Apache Storm: Real‑Time Stream Processing from Basics to Word‑Count and Call‑Log Examples

This tutorial explains Apache Storm’s core principles, architecture, and development workflow, covering its relationship with Hadoop, key concepts such as spouts, bolts, tuples, and topologies, and provides step‑by‑step code examples for a word‑count program and a call‑log analysis application.

Apache StormBig DataReal-time Processing
0 likes · 14 min read
Master Apache Storm: Real‑Time Stream Processing from Basics to Word‑Count and Call‑Log Examples
Ctrip Technology
Ctrip Technology
Jul 6, 2017 · Big Data

Evolution of Ctrip's Data Platform: From Version 1.0 to 2.0 for Risk Control

This article describes how Ctrip's information security team redesigned its data platform from a simple RabbitMQ‑MySQL pipeline to a scalable, real‑time and offline big‑data architecture using Kafka, Storm, Hadoop, Spark, and a custom count server, dramatically improving processing capacity and supporting risk‑control operations.

CtripReal-time Processingrisk control
0 likes · 8 min read
Evolution of Ctrip's Data Platform: From Version 1.0 to 2.0 for Risk Control
Java High-Performance Architecture
Java High-Performance Architecture
Jun 29, 2017 · Big Data

Master Apache Storm: Core Concepts, Real‑Time Word Count & Call Log Analytics

This tutorial introduces Apache Storm’s fundamental principles and development workflow, providing a PDF guide and source code for two practical examples—real‑time word‑count and call‑record aggregation—while covering its definition, use cases, relationship with Hadoop, core concepts, cluster architecture, and step‑by‑step usage.

Apache StormBig DataReal-time Processing
0 likes · 1 min read
Master Apache Storm: Core Concepts, Real‑Time Word Count & Call Log Analytics
Ctrip Technology
Ctrip Technology
Apr 13, 2017 · Big Data

Design and Implementation of Ctrip's Real-Time User Behavior System

The article describes how Ctrip redesigned its real-time user behavior service using a Java‑Kafka‑Storm stack with Redis and MySQL, detailing the architecture, real‑time processing, availability, performance, scalability, and deployment strategies to handle billions of events daily.

Real-time ProcessingStormSystem Architecture
0 likes · 13 min read
Design and Implementation of Ctrip's Real-Time User Behavior System
Efficient Ops
Efficient Ops
Mar 20, 2017 · Big Data

How eBay Built a Scalable Kafka‑Based Real‑Time Data Transmission Platform

This article details eBay's year‑long development of an enterprise‑grade, Kafka‑driven data transmission platform, covering its architecture, core services, monitoring and automation strategies, as well as performance tuning techniques that enable high throughput, low latency, and reliable cross‑data‑center replication.

Data StreamingKafkaReal-time Processing
0 likes · 22 min read
How eBay Built a Scalable Kafka‑Based Real‑Time Data Transmission Platform
21CTO
21CTO
Mar 10, 2017 · Big Data

Inside Tencent Analytics: How TA Handles TB‑Scale Real‑Time Web Data

Tencent Analytics (TA) is a free web analytics platform that processes terabytes of daily data in real time, using a custom architecture featuring JavaScript collection, event streaming, in‑memory computation, and NoSQL storage with Redis and LevelDB, offering site owners instant insights and high availability.

Big DataLevelDBReal-time Processing
0 likes · 12 min read
Inside Tencent Analytics: How TA Handles TB‑Scale Real‑Time Web Data
Architecture Digest
Architecture Digest
Feb 28, 2017 · Big Data

Architecture and Real‑Time Processing Design of Tencent Analytics (TA)

This article explains the architecture, real‑time computation framework, and storage solutions of Tencent Analytics, detailing how massive TB‑level web‑traffic data are collected via JavaScript, processed in memory‑centric streaming components, and stored using Redis and LevelDB to achieve second‑level updates.

Big DataLevelDBNoSQL
0 likes · 13 min read
Architecture and Real‑Time Processing Design of Tencent Analytics (TA)
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 7, 2016 · Big Data

How Alibaba Handled Real‑Time Billions of Events During Double 11

This article outlines Alibaba Cloud's big‑data platform challenges and solutions during the 2016 Double 11 event, covering sub‑second real‑time processing, multi‑million‑records‑per‑second throughput, full‑day high availability, and massive offline workloads exceeding hundreds of petabytes.

AlibabaDistributed SystemsMaxCompute
0 likes · 3 min read
How Alibaba Handled Real‑Time Billions of Events During Double 11
Ctrip Technology
Ctrip Technology
Dec 2, 2016 · Big Data

Design and Architecture of Ctrip's Aegis Risk Control System

This article presents a comprehensive overview of Ctrip's Aegis risk control system, detailing its modular architecture, rule engine, data service layer, Chloro analytics platform, and future directions, while highlighting the use of streaming, big‑data processing, and machine‑learning models for real‑time fraud detection.

Big DataReal-time Processingmachine learning
0 likes · 13 min read
Design and Architecture of Ctrip's Aegis Risk Control System
21CTO
21CTO
May 16, 2016 · Information Security

Building a Scalable Payment Risk Control System: Architecture & CEP

This article outlines the design of a payment risk control system, detailing functional and non‑functional requirements, core components such as real‑time, near‑real‑time, and batch engines, rule and penalty centers, and explains the role of CEP and Drools in achieving flexible, high‑performance fraud detection.

CEPDroolsReal-time Processing
0 likes · 12 min read
Building a Scalable Payment Risk Control System: Architecture & CEP
Architecture Digest
Architecture Digest
May 15, 2016 · Information Security

Design and Architecture of a Payment Risk Control System

The article explains the functional and non‑functional requirements, common pitfalls, and detailed architecture—including real‑time, near‑real‑time, and batch engines, rule and penalty centers, and CEP technology—of a payment risk control system aimed at detecting and mitigating fraud while maintaining performance and flexibility.

CEPReal-time ProcessingSystem Architecture
0 likes · 12 min read
Design and Architecture of a Payment Risk Control System
Java High-Performance Architecture
Java High-Performance Architecture
Apr 18, 2016 · Big Data

Why Spark Is Outpacing Hadoop: Speed, Real‑Time Processing, and ML Advantages

The article explains how Spark has become the leading open‑source big‑data platform, highlighting its superior speed, in‑memory processing, real‑time streaming, and built‑in machine‑learning library compared with Hadoop’s slower, disk‑based MapReduce approach and reliance on external storage and ML tools.

Big DataHadoopReal-time Processing
0 likes · 5 min read
Why Spark Is Outpacing Hadoop: Speed, Real‑Time Processing, and ML Advantages
Architect
Architect
Feb 29, 2016 · Big Data

Design Principles of Real-Time Distributed Streaming Systems: A Comparison of Spark and Storm

This article examines the design considerations of real-time distributed streaming systems, outlines their background and characteristics, compares the architectures of Spark Streaming and Storm, discusses primitives, message passing, high availability, storage models, and integration with production environments, providing practical insights for architects.

Distributed SystemsReal-time ProcessingSpark
0 likes · 20 min read
Design Principles of Real-Time Distributed Streaming Systems: A Comparison of Spark and Storm
Baidu Maps Tech Team
Baidu Maps Tech Team
Feb 3, 2016 · Big Data

How Baidu Maps Powers Its Open Platform with Big Data Architecture

This article explains how Baidu Maps’ open platform handles massive daily location data through real‑time and offline pipelines, Hadoop‑based offline computing, stream processing, and query engines built on MySQL, Redis, and Apache Kylin, while outlining future big‑data enhancements.

Apache KylinBaidu MapsHadoop
0 likes · 7 min read
How Baidu Maps Powers Its Open Platform with Big Data Architecture
21CTO
21CTO
Jan 16, 2016 · Backend Development

How Kuaidi Dache Scaled to Millions: Lessons from LBS, Long Connections, and Real‑Time Data Architecture

This article details the architectural evolution of Kuaidi Dache from 2013‑2014, covering LBS bottlenecks, MongoDB scaling, long‑connection stability, distributed system refactoring, a wireless open platform, real‑time monitoring with Storm and HBase, and a data‑center built on sharding and synchronization.

BackendMessagingReal-time Processing
0 likes · 16 min read
How Kuaidi Dache Scaled to Millions: Lessons from LBS, Long Connections, and Real‑Time Data Architecture
21CTO
21CTO
Jan 8, 2016 · Backend Development

How Didi Scaled Ride‑Hailing: LBS, Long‑Connection, and Real‑Time Data Solutions

Facing explosive traffic growth in 2014, Didi’s ride‑hailing platform tackled critical challenges by redesigning its LBS architecture, replacing unstable long‑connection services with an AIO‑based framework, partitioning databases, adopting Dubbo and RocketMQ for distributed processing, and building a real‑time monitoring and data center using Storm, HBase, and custom SQL‑to‑HBase translation.

Real-time ProcessingRide Hailingdatabase sharding
0 likes · 14 min read
How Didi Scaled Ride‑Hailing: LBS, Long‑Connection, and Real‑Time Data Solutions
Qunar Tech Salon
Qunar Tech Salon
Jan 6, 2016 · Backend Development

Architecture Evolution and Scaling Solutions of Kuaidi Dache (Fast Taxi) Service

This article details the rapid traffic growth challenges faced by Kuaidi Dache from 2013‑2014 and presents representative architectural bottlenecks and the engineering solutions—including LBS optimization, long‑connection redesign, distributed refactoring, a wireless open platform, real‑time monitoring, and data layer transformation—that enabled stable, scalable, high‑performance ride‑hailing services.

Real-time ProcessingRide HailingScalability
0 likes · 13 min read
Architecture Evolution and Scaling Solutions of Kuaidi Dache (Fast Taxi) Service
Architect
Architect
Dec 4, 2015 · Operations

Evolution of Qiniu Cloud Data Processing Architecture

The article explains how Qiniu's data processing platform has evolved from a simple real‑time URL‑based model to a more complex architecture featuring separate caching, agent services, discover monitoring, and container‑based elastic scaling to handle massive unstructured data workloads.

Real-time Processingcloud architecturecontainerization
0 likes · 9 min read
Evolution of Qiniu Cloud Data Processing Architecture
21CTO
21CTO
Nov 21, 2015 · Big Data

Why Build a Kafka System? Core Use Cases and Design Principles

This article explains why Kafka is essential for activity and operational data pipelines, outlines key use cases such as news feeds, relevance ranking, security, monitoring, and reporting, and details its deployment topology, design decisions, and message persistence strategies.

Distributed MessagingKafkaReal-time Processing
0 likes · 14 min read
Why Build a Kafka System? Core Use Cases and Design Principles

Understanding Storm: A Distributed Real-Time Computation System

The article explains the need for low‑latency, high‑performance, distributed real‑time processing, outlines the challenges such systems must address, and introduces Storm as a Hadoop‑like framework for stream processing, detailing its architecture, fault‑tolerance mechanisms, transactional topology, and large‑scale deployment at Taobao.

Big DataDistributed SystemsReal-time Processing
0 likes · 14 min read
Understanding Storm: A Distributed Real-Time Computation System
21CTO
21CTO
Sep 24, 2015 · Big Data

Comparing Apache Storm, Spark, and Samza: Which Real‑Time Stream Processor Fits Your Needs?

Apache Storm, Spark Streaming, and Samza are three open‑source, low‑latency, scalable distributed systems for real‑time data processing; this article outlines their architectures, key concepts, differences in data handling, state management, delivery guarantees, and typical use‑cases to help you choose the right framework.

Apache SamzaApache StormBig Data
0 likes · 7 min read
Comparing Apache Storm, Spark, and Samza: Which Real‑Time Stream Processor Fits Your Needs?

Comparative Overview of Apache Storm, Spark Streaming, and Samza for Real-Time Data Processing

This article introduces Apache Storm, Spark Streaming, and Samza, explains their architectures, common features, key differences such as delivery guarantees and state management, and provides guidance on selecting the most suitable framework for various real‑time big‑data use cases.

Apache StormBig DataComparison
0 likes · 8 min read
Comparative Overview of Apache Storm, Spark Streaming, and Samza for Real-Time Data Processing
Suning Technology
Suning Technology
May 22, 2015 · Big Data

Suning’s Big Data Platform Evolution: From SAP BW to Real‑Time Streaming

This article chronicles Suning’s journey from early SAP‑based data warehouses to a modern, open‑source big data platform featuring real‑time collection, Hadoop‑Hive offline processing, Storm‑based streaming, and a visual development environment, highlighting how each layer addresses growing data volume, variety, and business demands.

Data ArchitectureHadoopReal-time Processing
0 likes · 5 min read
Suning’s Big Data Platform Evolution: From SAP BW to Real‑Time Streaming