Tagged articles

252 articles

Page 2 of 3

Jul 5, 2021 · Operations

Automated and Intelligent Analysis of Baidu Search Stability Issues

The team automated Baidu Search fault diagnosis by building a side‑index for instant log lookup, streaming incremental analysis, exhaustive rule templates, feature‑engineering pipelines, query‑scene reconstruction, entropy‑based ranking, per‑second timeline views, and chaos‑engineered fault injection, achieving near‑99% accuracy and second‑level, module‑granular stability tracing.

ObservabilitySearch Stabilitychaos engineering

0 likes · 15 min read

Automated and Intelligent Analysis of Baidu Search Stability Issues

Tencent Cloud Middleware

Jun 30, 2021 · Fundamentals

Understanding Apache Pulsar Transactions: Core Concepts and Workflow

Apache Pulsar 2.8.0 introduces transaction support, featuring a Transaction Coordinator, Transaction Buffer, Transaction Log, Transaction ID, and Pending Acknowledge State, with a detailed workflow that ensures exactly‑once semantics for stream processing, contrasting its design with Kafka’s approach.

Apache PulsarExactly-OnceKafka Comparison

0 likes · 13 min read

Understanding Apache Pulsar Transactions: Core Concepts and Workflow

Yuewen Technology

Jun 25, 2021 · Big Data

Building Yuedu Group’s Overseas Big Data Platform: Architecture, Offline & Real‑Time Processing

This article details how Yuedu Group designed and implemented an overseas big data platform, covering overall system architecture, offline data‑warehouse construction with dimensional modeling, real‑time streaming using Oceanus and ClickHouse, and future plans for cost reduction and data quality assurance.

Big DataReal-time Processingarchitecture

0 likes · 12 min read

Building Yuedu Group’s Overseas Big Data Platform: Architecture, Offline & Real‑Time Processing

NetEase Smart Enterprise Tech+

Jun 17, 2021 · Big Data

Building a Real‑Time Service Monitoring Framework with Flink at NetEase Cloud

This article explains how NetEase Cloud Communication designed and implemented a Flink‑based streaming aggregation framework that processes massive heartbeat logs in real time, handles data skew with two‑stage aggregation, and outputs metrics to Kafka and InfluxDB for monitoring and alerting.

Data SkewFlinkMetric Computation

0 likes · 11 min read

Building a Real‑Time Service Monitoring Framework with Flink at NetEase Cloud

ITFLY8 Architecture Home

May 31, 2021 · Backend Development

Why Apache Pulsar Beats Kafka and RocketMQ for Scalable Messaging Platforms

This article details how Lakala built a distributed, cloud‑native messaging platform using Apache Pulsar, covering functional requirements, architectural advantages, performance testing, and real‑world integration scenarios such as OGG adapters, TiDB pipelines, OpenMessaging, custom sources, functions, Flink connectors, and future plans.

Apache PulsarBackend ArchitectureDistributed Messaging

0 likes · 18 min read

Why Apache Pulsar Beats Kafka and RocketMQ for Scalable Messaging Platforms

Baidu Geek Talk

May 24, 2021 · Big Data

Real-Time Quantile Computation Using TDigest: Architecture and Solutions

The article presents a real‑time quantile solution using the TDigest data structure, which clusters data into centroids and stores digests in Redis or Doris, pre‑computes quantiles for all dimension combinations, and provides a reusable API that delivers fast, accurate, low‑memory quantile statistics for diverse business scenarios.

data aggregationdorisreal-time quantile

0 likes · 11 min read

Real-Time Quantile Computation Using TDigest: Architecture and Solutions

Laravel Tech Community

May 20, 2021 · Big Data

Flink 1.13 Release Highlights: Passive Scaling and Performance Analysis Features

Flink 1.13 introduces passive scaling that lets users adjust parallelism to resize jobs, adds visual tools such as load/back‑pressure charts, CPU flame graphs, and state‑backend metrics for deeper performance insight, and includes numerous community optimizations for easier upgrades and operation.

FlinkState Backendpassive scaling

0 likes · 5 min read

Flink 1.13 Release Highlights: Passive Scaling and Performance Analysis Features

Baidu Geek Talk

May 17, 2021 · Artificial Intelligence

Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)

The Imazon platform unifies Baidu’s image acquisition, feature extraction, and ANN‑based multimodal retrieval into a cloud‑native, real‑time pipeline that ingests billions of images daily, optimizes storage and GPU usage, reduces message‑queue costs, and ensures high‑throughput, low‑latency search across text, visual, and voice queries.

Cloud NativeDAGImage Processing

0 likes · 13 min read

Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)

Laravel Tech Community

Apr 23, 2021 · Big Data

Jet Release Notes – New Features, Enhancements, Fixes, and Major Changes

Jet, an open‑source in‑memory distributed batch and stream processing engine, introduces dynamic ClassDefinition for SQL, JDBC sink batch size option, SQL aggregation memory optimizations, improved sliding‑window pickAny performance, numerous extension updates, critical bug fixes, and a major API change to DAG.toDotString for enhanced parallelism control.

Jetdistributed computingstream processing

0 likes · 4 min read

Jet Release Notes – New Features, Enhancements, Fixes, and Major Changes

dbaplus Community

Apr 20, 2021 · Big Data

10 Common Pitfalls When Migrating Spark Jobs to Flink (And How to Avoid Them)

This article shares ten practical pitfalls encountered when moving hourly Spark session jobs to Flink, covering parallelism load imbalance, state TTL, checkpointing strategies, logging, JMX debugging, state migration risks, reduce vs process choices, input data validation, event‑time handling, and external storage considerations, along with concrete configuration snippets and performance tips.

FlinkSpark migrationState Management

0 likes · 20 min read

10 Common Pitfalls When Migrating Spark Jobs to Flink (And How to Avoid Them)

DataFunTalk

Apr 5, 2021 · Big Data

Bigo Real‑Time Computing Platform: Architecture, Features, and Performance Improvements

This article presents the evolution, architecture, and key innovations of Bigo's real‑time computing platform—covering its migration from Spark Streaming to Flink, unified platform design, development tools, operational enhancements, and the efficiency gains achieved in business scenarios such as ETL and AB‑testing.

AB testingBigoFlink

0 likes · 13 min read

Bigo Real‑Time Computing Platform: Architecture, Features, and Performance Improvements

Ctrip Technology

Mar 25, 2021 · Big Data

Challenges and Approaches for Real‑Time Data Aggregation Analysis

The article examines the key challenges of real‑time data aggregation—data freshness, timely processing, and result visibility—and surveys common solutions such as timestamp‑based sync, CDC, full and incremental computation, storage formats, and trigger mechanisms.

Big DataCDCIncremental Computation

0 likes · 11 min read

Challenges and Approaches for Real‑Time Data Aggregation Analysis

Sohu Tech Products

Feb 17, 2021 · Big Data

Dynamic Data Partitioning in Apache Flink: A Fraud Detection Demo

This article explains how to implement dynamic data partitioning in Apache Flink using a fraud‑detection demo, covering the system architecture, rule‑driven runtime reconfiguration, custom ProcessFunction code, and the underlying key‑by logic that enables flexible, real‑time stream processing.

Apache FlinkDynamic PartitioningKeyBy

0 likes · 11 min read

Dynamic Data Partitioning in Apache Flink: A Fraud Detection Demo

DataFunTalk

Jan 28, 2021 · Big Data

Real-Time Financial Data Lake: Architecture, Practices, and Applications at Zhongyuan Bank

This talk by Ba Xueyu, a senior big data platform engineer at Zhongyuan Bank, outlines the background, architecture, and engineering practices of a real‑time financial data lake, highlighting its open, timely, and integrated design, streaming platform implementation, and use cases such as anti‑fraud and real‑time BI.

Flinkanti-fraudfinancial analytics

0 likes · 15 min read

Real-Time Financial Data Lake: Architecture, Practices, and Applications at Zhongyuan Bank

Alibaba Cloud Developer

Jan 25, 2021 · Big Data

Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem

In 2020, Apache Flink surged to become the most active Apache project, releasing three major versions that advanced its unified stream‑batch engine, introduced cloud‑native K8s support, expanded AI capabilities with PyFlink, and fostered a thriving Chinese community, solidifying its role as the de‑facto standard for real‑time computing.

AI integrationApache FlinkBig Data

0 likes · 21 min read

Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem

Architects Research Society

Jan 9, 2021 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

This article explains the purpose, semantics, and design of Apache Kafka’s transaction API, detailing how it enables exactly‑once processing for stream‑processing applications, the role of transaction coordinators and logs, Java API usage, performance considerations, and best‑practice guidance.

Apache KafkaBig DataJava API

0 likes · 19 min read

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

DataFunTalk

Jan 5, 2021 · Big Data

Highlights of Flink Forward Asia 2020: Stream‑Batch Integration, AI Fusion, and Cloud‑Native Advances

The 2020 Flink Forward Asia conference showcased Apache Flink's rapid growth, community milestones, industry adoption, and technical breakthroughs such as unaligned checkpoints, approximate failover, the Nexmark benchmark, stream‑batch unification, AI integration via PyFlink and Alink, and deep cloud‑native support on Kubernetes, illustrated through case studies from Alibaba, Meituan, Kuaishou, and Dell.

AI integrationApache FlinkCloud Native

0 likes · 20 min read

Highlights of Flink Forward Asia 2020: Stream‑Batch Integration, AI Fusion, and Cloud‑Native Advances

Full-Stack Internet Architecture

Dec 20, 2020 · Big Data

Using Flinkx for Data Synchronization in Sharded MySQL Environments

This article explains how to leverage Flinkx and Flink Stream API to create a unified data‑sync task that extracts data from sharded MySQL tables, splits the workload, and pushes it to an MQ cluster, while detailing the underlying InputFormat and Reader architecture.

Big DataFlinkFlinkX

0 likes · 8 min read

Using Flinkx for Data Synchronization in Sharded MySQL Environments

DataFunTalk

Dec 11, 2020 · Big Data

My Journey and Contributions in the Apache Flink Community

The author shares his personal journey from first encountering Flink to becoming an Apache Flink Committer at ByteDance, detailing community involvement, code contributions, bug fixes, lessons learned, advice for newcomers, and concluding with promotional offers for Flink services.

Apache FlinkBlink PlannerSQL

0 likes · 12 min read

My Journey and Contributions in the Apache Flink Community

Xianyu Technology

Nov 26, 2020 · Backend Development

Design and Implementation of a Multi‑Platform DSL Translator for Omega Real‑Time Reach System

The Omega real‑time reach system uses a custom DSL that unifies input data, standardizes CEP APIs, and leverages an Antlr V4‑based translation framework to generate Python, C++, JavaScript, and Java code for cloud, edge, and front‑end containers, cutting rule development from days to minutes.

ANTLRCEPCode Generation

0 likes · 9 min read

Design and Implementation of a Multi‑Platform DSL Translator for Omega Real‑Time Reach System

Alibaba Cloud Developer

Nov 22, 2020 · Big Data

How Flink’s Stream‑Batch Integration Powered Alibaba’s Record‑Breaking Double‑11

Alibaba’s 2020 Double‑11 achieved unprecedented real‑time processing of 4 billion records per second and 7 TB of data per second using Flink, showcasing the stability, performance and efficiency of its stream‑batch unified architecture across diverse business scenarios.

AlibabaBatch ProcessingBig Data

0 likes · 15 min read

How Flink’s Stream‑Batch Integration Powered Alibaba’s Record‑Breaking Double‑11

dbaplus Community

Oct 29, 2020 · Big Data

Inside Didi’s Real-Time Data Warehouse for Ride-Sharing: Architecture & Lessons

This article details Didi’s end‑to‑end construction of a real‑time data warehouse for the Ride‑Sharing (顺风车) business, covering motivations, layer‑by‑layer architecture, naming conventions, StreamSQL capabilities, operational tooling, achieved results, challenges, and future batch‑stream integration plans.

DidiFlinkreal-time data warehouse

0 likes · 21 min read

Inside Didi’s Real-Time Data Warehouse for Ride-Sharing: Architecture & Lessons

Big Data Technology & Architecture

Oct 23, 2020 · Big Data

Overview of Real-Time Big Data Processing: Spark Structured Streaming, CarbonData, Flink, and Cloud Stream

This article provides a comprehensive overview of modern real‑time big‑data solutions, detailing Spark Structured Streaming capabilities, CarbonData’s storage architecture, Meituan’s Flink deployments, and Huawei Cloud Stream’s unified streaming service, highlighting their features, challenges, and future directions.

CarbonDataFlinkReal-time analytics

0 likes · 17 min read

Overview of Real-Time Big Data Processing: Spark Structured Streaming, CarbonData, Flink, and Cloud Stream

DataFunTalk

Oct 2, 2020 · Big Data

Single-Task Recovery in Flink: Design and Implementation for Real‑Time Stream Processing

This article describes ByteDance's single‑task recovery solution for Flink's real‑time computation, detailing the problem of global job restarts, the proposed network‑layer enhancements, upstream and downstream optimizations, JobManager restart strategy, implementation challenges, and the measurable latency and availability benefits achieved in production.

FlinkSingle-Task Recoveryfault tolerance

0 likes · 11 min read

Single-Task Recovery in Flink: Design and Implementation for Real‑Time Stream Processing

Alibaba Cloud Developer

Sep 15, 2020 · Big Data

Designing Nexmark: A Standard Benchmark for Stream Processing Performance

This article examines the challenges of existing stream‑processing benchmarks, introduces the open‑source Nexmark framework designed for reproducible, comprehensive performance testing, describes its metrics, query set, workload configurability, and presents experimental results on Flink, highlighting its role in advancing big‑data stream benchmarking.

BenchmarkCPUFlink

0 likes · 14 min read

Designing Nexmark: A Standard Benchmark for Stream Processing Performance

DataFunTalk

Sep 13, 2020 · Big Data

Online Sample Generation with Flink: Architecture and Implementation

This article explains why Flink is chosen for online sample generation, describes the end‑to‑end implementation steps—including stream union, state‑timer processing, and output formatting—covers state backend choices, monitoring, validation, fault handling, and platformization for scalable real‑time machine‑learning pipelines.

FlinkKafkaOnline Sample Generation

0 likes · 11 min read

Online Sample Generation with Flink: Architecture and Implementation

Big Data Technology & Architecture

Sep 12, 2020 · Big Data

Technical Architecture and Component Selection of a Real‑time Data Platform (RTDP)

This article details the technical architecture of a Real‑time Data Platform (RTDP), covering component selection such as DBus, Kafka, Wormhole, Moonbox and Davinci, and discusses design considerations, data management, security, operational practices, and various deployment modes for big‑data applications.

Big Data ArchitectureRTDPdata security

0 likes · 22 min read

Technical Architecture and Component Selection of a Real‑time Data Platform (RTDP)

Big Data Technology & Architecture

Sep 12, 2020 · Big Data

Designing a Real‑time Data Platform for Modern Data Warehouses

This article explores the evolution from traditional to modern data warehouses, outlines the key capabilities of real‑time data platforms such as data real‑time, virtualization, democratization and collaboration, and presents a comprehensive architecture design with unified collection, streaming, compute and visualization layers, while discussing functional, quality, stability, cost, agility and management considerations.

architecturedata virtualizationreal-time data

0 likes · 18 min read

Designing a Real‑time Data Platform for Modern Data Warehouses

Didi Tech

Aug 26, 2020 · Big Data

Real-time Data Warehouse Construction at Didi: Architecture, Practices, and Lessons

To support Didi’s fast‑growing car‑pool service, a real‑time data warehouse was built using a streamlined layered architecture—ODS, DWD, DIM, DWM, and APP—leveraging Flink‑based StreamSQL, Kafka, Druid and ClickHouse to deliver minute‑level analytics, dashboards, monitoring, and cross‑business interfaces while planning unified meta‑store integration.

Big Data ArchitectureData PlatformFlink

0 likes · 20 min read

Real-time Data Warehouse Construction at Didi: Architecture, Practices, and Lessons

IT Architects Alliance

Aug 12, 2020 · Big Data

Introduction to Confluent KSQL for Real-Time Stream Processing

This article introduces Confluent KSQL, a SQL‑based real‑time stream processing engine for Kafka, covering its architecture, stream vs table concepts, query lifecycle, Docker‑based setup, DDL commands, example joins, windowed aggregations, connectors, and its advantages and limitations.

Big DataDockerKSQL

0 likes · 9 min read

Introduction to Confluent KSQL for Real-Time Stream Processing

DataFunTalk

Aug 10, 2020 · Big Data

Understanding Flink SQL Architecture, Optimizations, and Internal Mechanisms

This article explains the evolution of Apache Flink's SQL support, detailing the Blink Planner architecture, the end‑to‑end Flink SQL workflow, logical and physical planning, code generation, stream‑specific optimizations such as retraction and mini‑batch, and future development directions.

Blink PlannerFlinkSQL

0 likes · 20 min read

Understanding Flink SQL Architecture, Optimizations, and Internal Mechanisms

Youzan Coder

Jul 15, 2020 · Big Data

Design and Implementation of Youzan ABTest System for Data‑Driven Growth

Youzan created an internal A/B testing platform—combining Java/Node SDKs, a real‑time data pipeline, and a metadata‑driven workflow—to enable data‑driven product iteration, granular traffic allocation, automated logging, statistical analysis, and scalable growth insights across its merchant services, while planning further automation and integration.

A/B testingBig DataExperiment Platform

0 likes · 19 min read

Design and Implementation of Youzan ABTest System for Data‑Driven Growth

Alibaba Cloud Developer

Jul 13, 2020 · Big Data

What’s New in Apache Flink 1.11? A Deep Dive into Features and Performance

Apache Flink 1.11.0, released after four months of development, brings major ecosystem, usability, and stability improvements—including CDC support, a new JDBC catalog, real‑time Hive integration, a redesigned source API, PyFlink enhancements, application mode for Kubernetes, and checkpoint optimizations—while highlighting the growing contribution of Chinese developers.

Apache FlinkCheckpointFeature Release

0 likes · 20 min read

What’s New in Apache Flink 1.11? A Deep Dive into Features and Performance

DataFunTalk

Jul 10, 2020 · Big Data

Apache Flink Practice at NetEase: Architecture, Scale, and Future Directions

This article details NetEase's evolution from Storm to Flink for real‑time computing, describing the Sloth platform's architecture, large‑scale deployment, diverse business scenarios, monitoring, alerting, and future development plans, illustrating how Flink powers data synchronization, real‑time warehousing, and e‑commerce analytics and recommendation.

Data WarehouseFlinkNetEase

0 likes · 15 min read

Apache Flink Practice at NetEase: Architecture, Scale, and Future Directions

Big Data Technology Architecture

Jul 8, 2020 · Big Data

Apache Flink 1.11.0 Release: New Features and Optimizations

Apache Flink 1.11.0 introduces a suite of major enhancements—including unaligned checkpoints, a unified source interface, CDC support in Table API/SQL, performance‑boosted PyFlink, a new application deployment mode, and numerous UI, Docker, and catalog improvements—aimed at increasing usability, scalability, and integration across streaming and batch workloads.

FlinkSQLSource Interface

0 likes · 18 min read

Apache Flink 1.11.0 Release: New Features and Optimizations

Architect

Jun 11, 2020 · Big Data

Understanding Apache Flink Architecture, Data Transfer, Event‑Time Processing, State Management, and Checkpointing

This article explains Apache Flink's distributed system architecture—including JobManager, ResourceManager, TaskManager, and Dispatcher—covers session and job deployment modes, data transfer mechanisms, event‑time handling with watermarks, various state types and backends, scaling strategies, and the checkpoint/savepoint recovery process.

Apache FlinkBig DataEvent Time

0 likes · 15 min read

Understanding Apache Flink Architecture, Data Transfer, Event‑Time Processing, State Management, and Checkpointing

58 Tech

Jun 10, 2020 · Big Data

Real‑time Data Warehouse Practices at 58 Tongcheng Bao: From Spark Streaming 1.0 to Flink‑based 2.0

This article details the evolution of 58 Tongcheng Bao's real‑time data warehouse, describing the initial Spark‑Streaming architecture, its limitations, and the redesign using Flink with a layered ODS‑DWD‑DWS‑APP model, data‑quality monitoring, join techniques, and the resulting improvements in latency and accuracy.

Big DataData QualityFlink

0 likes · 9 min read

Real‑time Data Warehouse Practices at 58 Tongcheng Bao: From Spark Streaming 1.0 to Flink‑based 2.0

Big Data Technology Architecture

May 22, 2020 · Big Data

Apache Flink 1.11 New Features Overview

The article provides a comprehensive overview of Apache Flink 1.11, detailing enhancements in cluster deployment, resource management, source/sink APIs, state backends, Table & SQL improvements, DataStream extensions, PyFlink/ML support, and runtime optimizations, along with relevant code examples and references.

Apache FlinkFlink 1.11Table API

0 likes · 19 min read

Big Data Technology & Architecture

May 18, 2020 · Big Data

Real‑time Data Platform (RTDP): Concepts, Architecture and Design Considerations

This article examines the design of a real‑time data platform, discussing its background concepts, modern data‑warehouse perspective, architectural layers, unified data‑collection, streaming, compute and visualization platforms, and the functional, quality, stability, cost and agility considerations required for building an end‑to‑end real‑time pipeline.

Data DemocratizationData Platformarchitecture

0 likes · 17 min read

Real‑time Data Platform (RTDP): Concepts, Architecture and Design Considerations

Architecture Digest

Mar 11, 2020 · Big Data

Apache Flink: Unified Stream and Batch Processing Architecture and Core Concepts

This article provides a comprehensive overview of Apache Flink, explaining how it unifies stream and batch processing on a single runtime, detailing its key features, APIs, libraries, architectural components, fault‑tolerance mechanisms, scheduling, iterative processing, and back‑pressure monitoring.

Apache FlinkBatch Processingbackpressure

0 likes · 20 min read

Apache Flink: Unified Stream and Batch Processing Architecture and Core Concepts

dbaplus Community

Mar 10, 2020 · Big Data

How OPPO’s ESA DataFlow Handles Billions of Events Daily with High Performance

OPPO's ESA DataFlow is a self‑developed high‑performance data‑flow framework that processes over a trillion events per day, offering flexible routing, scalable sources and sinks, persistent mmap‑based channels, built‑in monitoring, and easy extensibility for diverse data‑collection scenarios.

ESA DataFlowOPPOdata ingestion

0 likes · 11 min read

How OPPO’s ESA DataFlow Handles Billions of Events Daily with High Performance

Alibaba Cloud Developer

Feb 24, 2020 · Big Data

What’s New in Apache Flink 1.10? Deep Dive into Major Features and Enhancements

Apache Flink 1.10 introduces a major upgrade that merges the Blink engine, boosts performance and stability, adds native Kubernetes support, enhances SQL DDL, delivers production‑ready Hive batch compatibility, optimizes memory management, and expands Python UDF capabilities, with detailed feature breakdowns and code examples.

Apache FlinkBatch ProcessingKubernetes

0 likes · 8 min read

What’s New in Apache Flink 1.10? Deep Dive into Major Features and Enhancements

Big Data Technology & Architecture

Feb 15, 2020 · Big Data

Understanding Event Time and Watermarks in Apache Flink

This article explains how Apache Flink uses event‑time timestamps and watermarks to handle out‑of‑order and late data, describes the assignTimestampsAndWatermarks API with periodic and punctuated watermark assigners, and provides practical code examples for window lateness and side‑output handling.

Apache FlinkEvent TimeFlink

0 likes · 10 min read

Understanding Event Time and Watermarks in Apache Flink

Big Data Technology Architecture

Feb 12, 2020 · Big Data

Apache Flink 1.10 Release: New Features, Optimizations, and Kubernetes Integration

Apache Flink 1.10 introduces major performance and stability improvements, unified memory configuration, native Kubernetes session mode, enhanced Table API/SQL with production‑ready Hive integration, expanded Python UDF support, and a host of important bug fixes and connector updates, marking the largest community‑driven release to date.

Apache FlinkHive IntegrationKubernetes

0 likes · 17 min read

Apache Flink 1.10 Release: New Features, Optimizations, and Kubernetes Integration

DataFunTalk

Feb 10, 2020 · Artificial Intelligence

Real‑Time Intelligent Anomaly Detection Platform at Ctrip: Integrating Flink and TensorFlow (Prophet)

The article describes Ctrip's Prophet platform, which combines Flink real‑time stream processing with TensorFlow deep‑learning models to provide intelligent, low‑latency anomaly detection, replacing traditional rule‑based alerts and addressing challenges such as holiday traffic and model scalability.

AIDeep LearningFlink

0 likes · 13 min read

Real‑Time Intelligent Anomaly Detection Platform at Ctrip: Integrating Flink and TensorFlow (Prophet)

DataFunTalk

Jan 22, 2020 · Big Data

Real-Time Data Engineering Practices for Alibaba 1688 Business

This article explains how Alibaba 1688 achieves real‑time recommendation, advertising, and product statistics through a robust middle‑platform foundation, streaming engines like Blink, data synchronization tools, and scalable storage, illustrating three concrete engineering cases and the end‑to‑end real‑time data service pipeline.

AlibabaFlinkstream processing

0 likes · 8 min read

Real-Time Data Engineering Practices for Alibaba 1688 Business

Qunar Tech Salon

Dec 20, 2019 · Big Data

Understanding Flink Cluster Startup and Job Execution Process

This article explains the architecture of a Flink cluster, detailing the startup procedures for JobManager and TaskManager, the three deployment modes, and the end‑to‑end flow of a Flink job from client code through StreamGraph, JobGraph, ExecutionGraph to the physical execution on TaskManagers.

Big DataCluster ArchitectureFlink

0 likes · 10 min read

Understanding Flink Cluster Startup and Job Execution Process

vivo Internet Technology

Dec 18, 2019 · Big Data

Comprehensive Overview of Big Data Architecture, Lambda/Kappa Models, and End-to-End Data Platform Design

The article surveys modern big‑data architecture, contrasting Lambda and Kappa models, highlights common governance and integration pain points, and proposes an end‑to‑end platform featuring unified metadata, stream‑batch processing, one‑click ingestion, standardized modeling, intelligent query abstraction, and a comprehensive development IDE.

Big DataData PlatformETL

0 likes · 13 min read

Comprehensive Overview of Big Data Architecture, Lambda/Kappa Models, and End-to-End Data Platform Design

Big Data Technology & Architecture

Dec 17, 2019 · Big Data

Understanding Flink Sliding Windows and Performance Optimizations

This article explains Flink's sliding window mechanism, shows how the WindowAssigner and WindowOperator work with code examples, analyzes the performance impact of fine‑grained sliding windows, and proposes a practical workaround using tumbling windows combined with external storage such as Redis for efficient PV/UV aggregation.

Big DataFlinkPerformance Optimization

0 likes · 8 min read

Understanding Flink Sliding Windows and Performance Optimizations

Alibaba Cloud Developer

Dec 16, 2019 · Big Data

Why Apache Flink Became the Fastest‑Growing Open‑Source Big Data Engine in 2019

Apache Flink, the open‑source stream‑and‑batch processing engine, has surged to become one of the most active Apache projects, with rapid community growth in China, unified SQL capabilities, AI‑focused extensions, Kubernetes integration, and benchmark results that outperform Hive by up to seven times.

AIApache FlinkBig Data

0 likes · 14 min read

Why Apache Flink Became the Fastest‑Growing Open‑Source Big Data Engine in 2019

Big Data Technology & Architecture

Dec 4, 2019 · Big Data

Comprehensive Flink Interview Guide: Core Concepts, Advanced Topics, and Source‑Code Insights

This article provides an in‑depth Flink interview guide covering the framework’s core concepts, advanced features such as fault‑tolerance, state management, and checkpointing, as well as detailed explanations of its architecture, APIs, partitioning strategies, and source‑code flow, complete with code examples.

Big DataDistributed SystemsFlink

0 likes · 29 min read

Comprehensive Flink Interview Guide: Core Concepts, Advanced Topics, and Source‑Code Insights

Xianyu Technology

Dec 3, 2019 · Backend Development

Design and Implementation of Omega System's User Reach Center

The Omega system’s User Reach Center integrates a behavior‑collection hub, a Flink‑based CEP rule engine, and a plug‑in‑driven reach module that routes, filters, and dispatches actions via push, SMS or external calls, delivering sub‑second targeting, higher accuracy, reduced development effort, and plans for offline profiling and data‑loop closure.

BackendSystem Architecturestream processing

0 likes · 7 min read

Design and Implementation of Omega System's User Reach Center

Big Data Technology & Architecture

Dec 1, 2019 · Big Data

Understanding Flink LatencyMarker: End-to-End Delay Measurement and Implementation Details

This article explains the background, source‑code analysis, and practical implementation of Flink's LatencyMarker feature for measuring end‑to‑end job latency, including metric exposure, configuration options, and code snippets illustrating how latency markers are emitted and processed within the streaming pipeline.

Big DataEnd-to-End LatencyFlink

0 likes · 6 min read

Understanding Flink LatencyMarker: End-to-End Delay Measurement and Implementation Details

Xianyu Technology

Nov 28, 2019 · Big Data

Data‑Driven Seller Activity Enhancement on Xianyu

The Xianyu team built a data‑driven system that monitors seller online status and reply speed, uses Siddhi CEP to match behavior patterns, and orchestrates activities, tasks, and synchronization modules, boosting conversion by three percentage points and allowing new scenarios to launch without developer effort.

CEPactivity optimizatione‑commerce

0 likes · 8 min read

Data‑Driven Seller Activity Enhancement on Xianyu

Big Data Technology Architecture

Nov 23, 2019 · Backend Development

State as Database in Apache Flink: QueryableState and Savepoint Processor API

The article examines how Apache Flink's state management features, including QueryableState and the upcoming Savepoint Processor API, can serve as a lightweight database for real‑time applications, discussing their advantages, limitations, and practical usage scenarios.

FlinkQueryableStateReal‑Time Computing

0 likes · 10 min read

State as Database in Apache Flink: QueryableState and Savepoint Processor API

Xianyu Technology

Oct 24, 2019 · Backend Development

Design of a Real-Time Complex Event Processing System for Xianyu

The article details Xianyu’s real‑time complex event processing system, which abstracts diverse business scenarios—such as activity coupons, price‑drop alerts, rental suggestions, and promotional offers—into a rule‑driven pipeline comprising log collection, a Blink‑based DSL EPL engine, and configurable result delivery, enabling feature rollout in half a day with ~10‑second latency.

DSLstream processing

0 likes · 10 min read

Design of a Real-Time Complex Event Processing System for Xianyu

Big Data Technology & Architecture

Sep 18, 2019 · Big Data

Understanding Flink Checkpoint Mechanism and Configuration

This article explains Flink's checkpoint mechanism, its execution flow, common configuration options, and the benefits and considerations of incremental checkpoints using the RocksDB state backend, providing practical code examples and YAML settings for reliable stream processing.

Big DataCheckpointFlink

0 likes · 12 min read

Understanding Flink Checkpoint Mechanism and Configuration

Big Data Technology & Architecture

Sep 16, 2019 · Big Data

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

This guide provides a detailed overview of Apache Flink covering its core streaming engine, APIs (DataSet, DataStream, Table), architectural components, comparison with Spark Streaming, partitioning, parallelism, restart strategies, state backends, time semantics, watermarks, SQL processing, fault‑tolerance mechanisms, memory management, serialization, RPC framework, back‑pressure handling, operator chaining, and practical tips for interview preparation.

Apache FlinkBig DataDataflow

0 likes · 22 min read

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

dbaplus Community

Sep 10, 2019 · Big Data

Why Exactly‑Once Processing Is So Hard in Distributed Systems (And How to Tackle It)

This article explores the two toughest problems in distributed stream processing—exactly‑once message handling and ordering—by dissecting the underlying impossibility of perfect failure detectors, the liveness‑vs‑safety trade‑off, zombie processes, and the practical solutions employed by systems such as Flink, Kafka Streams, MillWheel, and Spark.

ConsensusDistributed SystemsExactly-Once

0 likes · 81 min read

Why Exactly‑Once Processing Is So Hard in Distributed Systems (And How to Tackle It)

Big Data Technology & Architecture

Sep 5, 2019 · Big Data

Applying Flink CEP for Complex Event Processing at Haolo Mobility

This article explains how Flink CEP, a complex event processing library for Apache Flink, is employed at Haolo Mobility to detect intricate patterns in endless data streams by modeling patterns as states and using pattern conditions for state transitions, illustrating its practical application in real‑world big‑data scenarios.

Big DataCEPFlink

0 likes · 2 min read

Applying Flink CEP for Complex Event Processing at Haolo Mobility

Big Data Technology & Architecture

Aug 26, 2019 · Big Data

Comprehensive Collection of Apache Flink Learning Resources

This article compiles a curated list of the most reliable and official Apache Flink learning materials—including beginner tutorials, source‑code walkthroughs, advanced topics, community articles, real‑world case studies, and downloadable resources—providing a one‑stop reference for developers and researchers interested in stream processing and big‑data analytics.

Apache FlinkBig DataResources

0 likes · 10 min read

Comprehensive Collection of Apache Flink Learning Resources

21CTO

Aug 20, 2019 · Big Data

How Mogu’s Advertising Platform Built a Real‑Time Data Pipeline with Storm, Flink, and Kylin

This article explains how Mogu’s advertising system designs and evolves a real‑time data pipeline—covering merchant and operation needs, data collection, cleaning, processing with Storm, Flink, and Kylin, and service guarantees—to enable high‑quality, low‑latency analytics for advertisers and the platform.

AdvertisingBig DataFlink

0 likes · 12 min read

How Mogu’s Advertising Platform Built a Real‑Time Data Pipeline with Storm, Flink, and Kylin

Big Data Technology & Architecture

Aug 18, 2019 · Big Data

Flink Application Scenarios and Scale at Kuaishou

The article details how Kuaishou leverages Apache Flink for large‑scale stream processing, describing its application scenarios, cluster sizing, interval join optimization, RocksDB performance challenges, source throttling strategies, JobManager stability, frequent job failures, and platform‑wide improvements.

Big DataFlinkKuaishou

0 likes · 2 min read

Flink Application Scenarios and Scale at Kuaishou

HomeTech

Aug 15, 2019 · Big Data

Real‑Time Data Warehouse Development with Flink: Architecture, Implementation, and Lessons Learned

This article describes the motivation, technology selection, implementation details, and practical challenges of building a real‑time data warehouse using Flink, covering stream ingestion, data cleaning, dimension‑table joins, state backend choices, and operational lessons for large‑scale streaming pipelines.

FlinkKafkaState Backend

0 likes · 8 min read

Real‑Time Data Warehouse Development with Flink: Architecture, Implementation, and Lessons Learned

Big Data Technology & Architecture

Aug 11, 2019 · Big Data

Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations

This article examines Flink’s industrial‑scale network stack, detailing the credit‑based flow control introduced in version 1.5, the refactored task‑IO thread collaboration, and serialization optimizations that together improve throughput and latency for large‑scale stream processing workloads.

Big DataCredit-based Flow ControlFlink

0 likes · 12 min read

Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations

Big Data Technology & Architecture

Aug 9, 2019 · Big Data

Understanding Exactly-Once Semantics in Apache Flink: Challenges and Implementation

This article analyzes the difficulties of achieving exactly-once delivery in Apache Flink, explains the distinction between state and end‑to‑end exactly‑once, and details how Flink implements exactly‑once sinks using idempotent and transactional approaches, including a Bucketing File Sink example.

CheckpointFlinkState Management

0 likes · 12 min read

Understanding Exactly-Once Semantics in Apache Flink: Challenges and Implementation

Big Data Technology Architecture

Aug 7, 2019 · Big Data

Why Choose Apache Flink for Real‑Time Stream Processing: Features and Lessons Learned

This article explains why the author chose Apache Flink for real‑time stream processing, highlighting its unique combination of high throughput, low latency, event‑time support, stateful computation, flexible windows, and fault tolerance, while also reflecting on the challenges of adopting a less‑documented technology.

Event TimeFlinkReal-Time

0 likes · 7 min read

Why Choose Apache Flink for Real‑Time Stream Processing: Features and Lessons Learned

Ziru Technology

Aug 1, 2019 · Big Data

How Ziru IM Leverages Flink for Real-Time Conversation Monitoring and Service Quality

The Ziru IM project uses Apache Flink to monitor real-time conversation metrics such as timely reply rates, average session duration, and message counts, employing two dialogue models and session definitions to enhance service quality and operational insight within an in‑app communication platform.

Conversation AnalyticsFlinkIM System

0 likes · 6 min read

How Ziru IM Leverages Flink for Real-Time Conversation Monitoring and Service Quality

Big Data Technology & Architecture

Jul 20, 2019 · Big Data

Registering UDF, UDTF, and UDAF Functions in Apache Flink – Common Pitfalls and Solutions

This article explains how to register scalar UDFs, table‑valued UDTFs, and aggregate UDAFs in Apache Flink, illustrates typical compilation and runtime pitfalls with concrete Scala code examples, and provides corrected implementations and best‑practice tips for reliable function registration.

Apache FlinkBig DataScala

0 likes · 13 min read

Registering UDF, UDTF, and UDAF Functions in Apache Flink – Common Pitfalls and Solutions

Big Data Technology & Architecture

Jul 2, 2019 · Big Data

Integrating Apache Flink with Apache Pulsar for Scalable Elastic Data Processing

This article explains how Apache Pulsar and Apache Flink can be combined to provide a unified, scalable, and fault‑tolerant data processing platform, covering Pulsar's architecture, its differences from other messaging systems, various integration patterns, and concrete code examples for stream and batch workloads.

Apache FlinkApache PulsarBig Data

0 likes · 13 min read

Integrating Apache Flink with Apache Pulsar for Scalable Elastic Data Processing

Big Data Technology & Architecture

Jun 29, 2019 · Big Data

Apache Flink 1.9 Feature Overview – Beijing Meetup (June 29)

On June 29, the Apache Flink Beijing Meetup presented a comprehensive analysis of Flink 1.9’s major architectural changes, new Table API & SQL capabilities, runtime and core enhancements, and future roadmap, with slides and resources made available for download.

Apache FlinkBig DataFlink 1.9

0 likes · 2 min read

Apache Flink 1.9 Feature Overview – Beijing Meetup (June 29)

Big Data Technology & Architecture

Jun 20, 2019 · Big Data

Comprehensive Guide to Flink SQL: Background, New Features, Programming Model, Operators, Functions, and a Practical NBA Scoring Leader Example

This article provides an in‑depth overview of Flink SQL, covering its origins, the latest 1.7.0 and 1.8.0 enhancements, the underlying programming model, common operators and built‑in functions, and a complete end‑to‑end example that analyzes NBA scoring‑leader data using Flink SQL.

Apache FlinkBig DataFlink SQL

0 likes · 27 min read

Comprehensive Guide to Flink SQL: Background, New Features, Programming Model, Operators, Functions, and a Practical NBA Scoring Leader Example

Big Data Technology & Architecture

Jun 13, 2019 · Fundamentals

Comparison of Kafka and Pulsar Stream Consumption Models and Rebalance Mechanisms

The article explains Kafka's consumer‑group rebalance and Pulsar's unified queue/stream subscription models, compares their partition assignment strategies, and demonstrates both with Docker‑based Pulsar setups, Java consumer code, and practical failover and exclusive scenarios.

KafkaPulsarconsumer-group

0 likes · 6 min read

Comparison of Kafka and Pulsar Stream Consumption Models and Rebalance Mechanisms

Big Data Technology & Architecture

Jun 12, 2019 · Big Data

Comprehensive Guide to FlinkCEP: API Overview, Pattern Definitions, Quantifiers, Conditions, and Usage Examples

This article provides a detailed introduction to FlinkCEP, covering how to add the library, define simple and composite patterns, use quantifiers and conditions, handle skip strategies, time constraints, and select results, with complete Java and Scala code examples for complex event processing.

Big DataCEPFlink

0 likes · 27 min read

Comprehensive Guide to FlinkCEP: API Overview, Pattern Definitions, Quantifiers, Conditions, and Usage Examples

360 Zhihui Cloud Developer

Jun 4, 2019 · Big Data

Why Flink Outperforms Storm: Deep Dive into Stream Processing Performance

Based on data transmission and reliability metrics, this article compares Apache Storm and Apache Flink in stream processing, presenting benchmark designs, test environments, results for synthetic and Kafka data, and offers practical recommendations such as operator chaining, object reuse, and checkpoint strategies to maximize Flink performance.

Big DataFlinkPerformance Testing

0 likes · 13 min read

Why Flink Outperforms Storm: Deep Dive into Stream Processing Performance

360 Tech Engineering

Jun 3, 2019 · Big Data

Performance Comparison of Apache Storm and Apache Flink from Data Transmission and Reliability Perspectives

This article presents a detailed performance benchmark comparing Apache Storm and Apache Flink in stream processing, focusing on data transmission methods, reliability mechanisms, operator chaining, and both self‑generated and Kafka‑sourced workloads, and provides practical optimization recommendations based on the results.

Big DataData TransmissionFlink

0 likes · 10 min read

Performance Comparison of Apache Storm and Apache Flink from Data Transmission and Reliability Perspectives

Big Data Technology & Architecture

Jun 2, 2019 · Big Data

Tencent's Oceanus Real-Time Stream Computing Platform and Flink Optimizations

The article presents Tencent's evolution of real‑time stream processing using Flink, the design of the Oceanus one‑stop visual platform, and a series of deep extensions and optimizations—including UI redesign, JobManager failover, checkpoint handling, enhanced windows, LocalKeyBy, idle detection, and log isolation—aimed at supporting petabyte‑scale data workloads.

Big DataFlinkOceanus

0 likes · 16 min read

Tencent's Oceanus Real-Time Stream Computing Platform and Flink Optimizations

Big Data Technology & Architecture

May 29, 2019 · Cloud Native

Real-Time Computing Solutions with Flink and HBase: Architecture, Market Analysis, and Use Cases

The article presents Alibaba Cloud's real-time computing solution based on Flink and HBase, covering market competition, open‑source ecosystem, containerized architecture on Kubernetes, and typical applications such as online education video analysis, city‑brain traffic management, and fraud detection.

Big DataCloud NativeFlink

0 likes · 12 min read

Real-Time Computing Solutions with Flink and HBase: Architecture, Market Analysis, and Use Cases

Big Data Technology & Architecture

May 26, 2019 · Big Data

Apache Flink at Didi: Platformization, Production Practices, and StreamSQL

This article describes how Didi adopted Apache Flink for its real‑time data streams, detailing the platformized architecture, production use cases such as ETL, monitoring and CEP, the evolution of StreamSQL, and the engineering improvements made to support large‑scale, low‑latency processing.

Big DataDidiFlink

0 likes · 14 min read

Apache Flink at Didi: Platformization, Production Practices, and StreamSQL

Java Backend Technology

May 26, 2019 · Backend Development

How to Prevent Hot‑Key Crashes in Cache Clusters with Real‑Time Streaming

This article explains why cache clusters are essential, describes the problems caused by hot keys and large values, and presents a multi‑layer solution using streaming analytics, automatic hotspot detection, local JVM caching, and rate‑limiting to keep backend systems stable under massive traffic spikes.

Backend ArchitectureCacheHot Key

0 likes · 10 min read

How to Prevent Hot‑Key Crashes in Cache Clusters with Real‑Time Streaming

Big Data Technology & Architecture

May 25, 2019 · Big Data

Understanding State TTL and Continuous Cleanup in Apache Flink 1.8.0

This article explains how Apache Flink's State TTL feature works, demonstrates configuring TTL for state size control and automatic cleanup, and details the continuous cleanup mechanisms introduced in Flink 1.8.0 for both heap and RocksDB state backends.

Apache FlinkContinuous CleanupJava

0 likes · 16 min read

Understanding State TTL and Continuous Cleanup in Apache Flink 1.8.0

Big Data Technology & Architecture

May 19, 2019 · Big Data

Implementing End-to-End Exactly-Once Semantics in Apache Flink with Apache Kafka Using Two-Phase Commit Sink

This article explains how Apache Flink’s TwoPhaseCommitSinkFunction, introduced in version 1.4, enables end-to-end exactly-once semantics when integrated with Apache Kafka, detailing the checkpoint mechanism and the two-phase commit protocol that ensures reliable data processing.

Apache FlinkApache KafkaBig Data

0 likes · 4 min read

Implementing End-to-End Exactly-Once Semantics in Apache Flink with Apache Kafka Using Two-Phase Commit Sink

Big Data Technology & Architecture

May 13, 2019 · Big Data

Understanding Apache Kafka: Core Concepts, Architecture, and Use Cases

This article explains Apache Kafka as a distributed streaming platform, detailing its key features, core APIs, topic and log architecture, partitioning, consumer groups, guarantees, and how it serves both messaging and storage roles for real‑time and batch processing in big‑data environments.

Apache KafkaDistributed Systemsmessage queues

0 likes · 13 min read

Understanding Apache Kafka: Core Concepts, Architecture, and Use Cases

G7 EasyFlow Tech Circle

Apr 23, 2019 · Big Data

How We Scaled Fatigue Event Processing to 45K TPS with Apache Flink

By iteratively redesigning the fatigue‑event detection pipeline and leveraging Apache Flink’s stateful stream processing, the team reduced network overhead, cut resource usage to a third, and achieved a stable 45,000 TPS throughput on six containers with 20 GB memory, while outlining three optimization phases and practical lessons.

Apache FlinkFatigue DetectionIoT

0 likes · 13 min read

How We Scaled Fatigue Event Processing to 45K TPS with Apache Flink

JD Retail Technology

Apr 18, 2019 · Big Data

Data Heterogeneity with BinLake, Binlog, and Flink: Approaches for Order, Subscription, and Product Data

The article explains how data heterogeneity is achieved using JD's BinLake to capture MySQL binlogs, with Flink handling sequential and parallel consumption for order, subscription, and product data, discussing challenges such as ordering guarantees, idempotency, IO overhead, and the shift toward stream‑processing architectures.

BinlogElasticsearchFlink

0 likes · 5 min read

Data Heterogeneity with BinLake, Binlog, and Flink: Approaches for Order, Subscription, and Product Data

ITPUB

Apr 4, 2019 · Big Data

Achieving Sub‑Second Real‑Time Product Selection with Xianyu’s Mach and Blink

Xianyu’s Mach system tackles the e‑commerce challenge of instantly selecting high‑quality items from billions of products by leveraging Blink’s low‑latency stream computing, detailing its architecture—including state, windows, custom UDX functions, data merging, rule execution, and SQL‑to‑MVEL conversion—to achieve sub‑second processing at massive scale.

FlinkReal-Timeblink

0 likes · 18 min read

Achieving Sub‑Second Real‑Time Product Selection with Xianyu’s Mach and Blink

Xianyu Technology

Mar 21, 2019 · Big Data

Design and Implementation of the Mahé Real-Time Product Selection System Using Blink Stream Computing

Mahé, Xianyu’s real‑time product selection platform, uses Alibaba’s Blink stream engine to merge, evaluate roughly 300 rule‑based filters per item and emit only changed results, processing 1.4 billion daily messages at up to 50 k TPS through a four‑layer, stateful architecture.

Big DataFlinkStateful Computation

0 likes · 15 min read

Design and Implementation of the Mahé Real-Time Product Selection System Using Blink Stream Computing

Youzan Coder

Mar 20, 2019 · Big Data

Evolution of Real-Time Computing at Youzan: From Storm to Flink and Future Directions

Youzan’s real‑time computing platform progressed from early Storm deployments through Spark Streaming to a Flink‑based architecture, adding unified task management, monitoring, and dedicated streaming clusters, while now pursuing SQL‑driven jobs, a Druid OLAP engine, and a future real‑time data warehouse.

Big DataFlinkSpark Streaming

0 likes · 14 min read

Evolution of Real-Time Computing at Youzan: From Storm to Flink and Future Directions

Big Data Technology & Architecture

Mar 12, 2019 · Big Data

Understanding Apache Flink’s Core Design: “Batch Is a Special Case of Stream” and Its Architecture

This article explains Apache Flink’s fundamental design principle that treats batch as a special case of stream, compares native streaming with micro‑batching, describes its deployment modes, fault‑tolerance mechanisms, unified data and scheduling layers, and outlines Alibaba’s architectural optimizations for the platform.

Apache FlinkBatch Processingnative streaming

0 likes · 15 min read

Understanding Apache Flink’s Core Design: “Batch Is a Special Case of Stream” and Its Architecture

DataFunTalk

Jan 30, 2019 · Artificial Intelligence

Real‑Time Metrics Processing Technology for Financial Risk Control and Anti‑Fraud

This article outlines the challenges of financial risk control in the internet era and presents a comprehensive real‑time metrics processing system, covering data leakage, fraud, big‑data opportunities, AI model deployment, and the technical architecture of the Bangsheng real‑time indicator platform.

AIBig Dataanti‑fraud

0 likes · 17 min read

Real‑Time Metrics Processing Technology for Financial Risk Control and Anti‑Fraud

Alibaba Cloud Developer

Jan 28, 2019 · Big Data

How Alibaba’s Blink Supercharges Flink for Massive Stream and Batch Processing

Alibaba’s Blink, an internal enhancement of Apache Flink, is now open‑sourced, bringing advanced runtime, SQL/TableAPI, Hive compatibility, Zeppelin integration, and a revamped Flink Web UI to dramatically boost performance and scalability for both streaming and batch workloads.

Batch ProcessingBig DataFlink

0 likes · 16 min read

How Alibaba’s Blink Supercharges Flink for Massive Stream and Batch Processing

NetEase Game Operations Platform

Jan 25, 2019 · Big Data

Understanding Exactly-Once Semantics in Apache Flink: Challenges and Implementation

This article analyzes the difficulties of achieving exactly-once delivery in Apache Flink, explains the distinction between state and end‑to‑end semantics, and details how idempotent and transactional sinks—illustrated with the Bucketing File Sink—realize exactly‑once guarantees through checkpoint‑based two‑phase commit.

Big DataExactly-OnceFlink

0 likes · 13 min read

Big Data Technology & Architecture

Jan 3, 2019 · Big Data

Stateful Stream Processing and Fault‑Tolerance Mechanisms in Apache Flink

This article explains the concept of stateful computation in stream processing, highlights the shortcomings of traditional systems, and details how Apache Flink provides rich state access, various state backends, and robust checkpointing mechanisms to achieve scalable, fault‑tolerant real‑time analytics.

FlinkState Backendstateful processing

0 likes · 10 min read

Stateful Stream Processing and Fault‑Tolerance Mechanisms in Apache Flink

Ctrip Technology

Dec 26, 2018 · Operations

Evolution of Ctrip's Hickwall Monitoring and Alerting Platform: Architecture, InfluxDB Cluster, Data Aggregation, and Stream Alerting

This article details the architectural evolution of Ctrip's Hickwall monitoring and alerting platform, describing the transition from an Elasticsearch‑based first generation to an InfluxDB‑driven second generation, the design of the Incluster storage layer, data aggregation strategies, and the implementation of high‑performance stream‑based alerting.

AlertingInfluxDBarchitecture

0 likes · 12 min read

Evolution of Ctrip's Hickwall Monitoring and Alerting Platform: Architecture, InfluxDB Cluster, Data Aggregation, and Stream Alerting

Alibaba Cloud Developer

Nov 29, 2018 · Big Data

Why Apache Flink Became the Fastest‑Growing Big Data Engine in 2018

This article introduces Apache Flink’s rapid rise as the leading open‑source big data engine, explains its role in batch, stream, and interactive analytics, showcases real‑world use cases from Alibaba, Didi, and ByteDance, and outlines how Flink powers both big data and AI workloads.

AIApache FlinkBatch Processing

0 likes · 8 min read

Why Apache Flink Became the Fastest‑Growing Big Data Engine in 2018

21CTO

Nov 7, 2018 · Big Data

Why Data Streams Are the Backbone of Real-Time Big Data Analytics

Data streams, akin to endless rivers, enable continuous, real-time processing of diverse sources such as IoT telemetry, web logs, and e-commerce events, offering advantages over batch processing, while presenting challenges like scalability and fault tolerance, and are supported by tools like Kinesis, Kafka, Flink, and Storm.

Amazon KinesisApache KafkaBig Data

0 likes · 6 min read

Why Data Streams Are the Backbone of Real-Time Big Data Analytics

dbaplus Community

Aug 8, 2018 · Big Data

How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns

This article explains the architecture of a Real‑Time Data Platform (RTDP), details the technical selection of core components such as DBus, Kafka, Wormhole, Moonbox and Davinci, and discusses data management, security, operations, and four deployment modes—synchronization, flow, rotation and intelligent—illustrating how each fits different business scenarios.

Big Data ArchitectureData IntegrationKafka

0 likes · 24 min read

How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns

JD Tech Talk

Aug 2, 2018 · Big Data

Real-Time Order Statistics with Apache Flink in a Data Aggregation Platform

This article explains how the data aggregation platform adopts Apache Flink for high‑throughput, low‑latency stream processing, covering the complete workflow from data source integration, transformation operations, windowing and time concepts, to a concrete order‑count example with custom aggregation logic.

Apache FlinkEvent TimeFlink

0 likes · 10 min read

Real-Time Order Statistics with Apache Flink in a Data Aggregation Platform

Xianyu Technology

Jul 28, 2018 · Big Data

Real-Time Computation Architecture for Non-Timeline Feed Ranking

The paper presents a real‑time computation architecture on Alibaba Cloud Blink that scores and ranks non‑timeline feed items within a sliding 72‑hour window, updating rankings every few minutes, using Redis ZSET for fast retrieval, and discusses scaling optimizations such as interval tuning and external join‑and‑rank services.

Big DataReal‑Time Computingfeed ranking

0 likes · 6 min read

Real-Time Computation Architecture for Non-Timeline Feed Ranking

Ctrip Technology

Jul 17, 2018 · Big Data

Meteor: A Real-Time Computation Platform Based on Storm for Ctrip Marketing

The article introduces Meteor, a Storm‑based real‑time computation platform developed by Ctrip Marketing to simplify topology management, automate deployment, and improve resource efficiency for complex marketing scenarios, highlighting its architecture, features, and measurable business impact.

Real‑Time ComputingStormmarketing platform

0 likes · 10 min read

Meteor: A Real-Time Computation Platform Based on Storm for Ctrip Marketing