Tagged articles
252 articles
Page 2 of 3
Baidu Geek Talk
Baidu Geek Talk
Jul 5, 2021 · Operations

Automated and Intelligent Analysis of Baidu Search Stability Issues

The team automated Baidu Search fault diagnosis by building a side‑index for instant log lookup, streaming incremental analysis, exhaustive rule templates, feature‑engineering pipelines, query‑scene reconstruction, entropy‑based ranking, per‑second timeline views, and chaos‑engineered fault injection, achieving near‑99% accuracy and second‑level, module‑granular stability tracing.

ObservabilitySearch Stabilitychaos engineering
0 likes · 15 min read
Automated and Intelligent Analysis of Baidu Search Stability Issues
Tencent Cloud Middleware
Tencent Cloud Middleware
Jun 30, 2021 · Fundamentals

Understanding Apache Pulsar Transactions: Core Concepts and Workflow

Apache Pulsar 2.8.0 introduces transaction support, featuring a Transaction Coordinator, Transaction Buffer, Transaction Log, Transaction ID, and Pending Acknowledge State, with a detailed workflow that ensures exactly‑once semantics for stream processing, contrasting its design with Kafka’s approach.

Apache PulsarExactly-OnceKafka Comparison
0 likes · 13 min read
Understanding Apache Pulsar Transactions: Core Concepts and Workflow
Yuewen Technology
Yuewen Technology
Jun 25, 2021 · Big Data

Building Yuedu Group’s Overseas Big Data Platform: Architecture, Offline & Real‑Time Processing

This article details how Yuedu Group designed and implemented an overseas big data platform, covering overall system architecture, offline data‑warehouse construction with dimensional modeling, real‑time streaming using Oceanus and ClickHouse, and future plans for cost reduction and data quality assurance.

Big DataReal-time Processingarchitecture
0 likes · 12 min read
Building Yuedu Group’s Overseas Big Data Platform: Architecture, Offline & Real‑Time Processing
ITFLY8 Architecture Home
ITFLY8 Architecture Home
May 31, 2021 · Backend Development

Why Apache Pulsar Beats Kafka and RocketMQ for Scalable Messaging Platforms

This article details how Lakala built a distributed, cloud‑native messaging platform using Apache Pulsar, covering functional requirements, architectural advantages, performance testing, and real‑world integration scenarios such as OGG adapters, TiDB pipelines, OpenMessaging, custom sources, functions, Flink connectors, and future plans.

Apache PulsarBackend ArchitectureDistributed Messaging
0 likes · 18 min read
Why Apache Pulsar Beats Kafka and RocketMQ for Scalable Messaging Platforms
Baidu Geek Talk
Baidu Geek Talk
May 24, 2021 · Big Data

Real-Time Quantile Computation Using TDigest: Architecture and Solutions

The article presents a real‑time quantile solution using the TDigest data structure, which clusters data into centroids and stores digests in Redis or Doris, pre‑computes quantiles for all dimension combinations, and provides a reusable API that delivers fast, accurate, low‑memory quantile statistics for diverse business scenarios.

data aggregationdorisreal-time quantile
0 likes · 11 min read
Real-Time Quantile Computation Using TDigest: Architecture and Solutions
Baidu Geek Talk
Baidu Geek Talk
May 17, 2021 · Artificial Intelligence

Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)

The Imazon platform unifies Baidu’s image acquisition, feature extraction, and ANN‑based multimodal retrieval into a cloud‑native, real‑time pipeline that ingests billions of images daily, optimizes storage and GPU usage, reduces message‑queue costs, and ensures high‑throughput, low‑latency search across text, visual, and voice queries.

Cloud NativeDAGImage Processing
0 likes · 13 min read
Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)
Laravel Tech Community
Laravel Tech Community
Apr 23, 2021 · Big Data

Jet Release Notes – New Features, Enhancements, Fixes, and Major Changes

Jet, an open‑source in‑memory distributed batch and stream processing engine, introduces dynamic ClassDefinition for SQL, JDBC sink batch size option, SQL aggregation memory optimizations, improved sliding‑window pickAny performance, numerous extension updates, critical bug fixes, and a major API change to DAG.toDotString for enhanced parallelism control.

Jetdistributed computingstream processing
0 likes · 4 min read
Jet Release Notes – New Features, Enhancements, Fixes, and Major Changes
dbaplus Community
dbaplus Community
Apr 20, 2021 · Big Data

10 Common Pitfalls When Migrating Spark Jobs to Flink (And How to Avoid Them)

This article shares ten practical pitfalls encountered when moving hourly Spark session jobs to Flink, covering parallelism load imbalance, state TTL, checkpointing strategies, logging, JMX debugging, state migration risks, reduce vs process choices, input data validation, event‑time handling, and external storage considerations, along with concrete configuration snippets and performance tips.

FlinkSpark migrationState Management
0 likes · 20 min read
10 Common Pitfalls When Migrating Spark Jobs to Flink (And How to Avoid Them)
Ctrip Technology
Ctrip Technology
Mar 25, 2021 · Big Data

Challenges and Approaches for Real‑Time Data Aggregation Analysis

The article examines the key challenges of real‑time data aggregation—data freshness, timely processing, and result visibility—and surveys common solutions such as timestamp‑based sync, CDC, full and incremental computation, storage formats, and trigger mechanisms.

Big DataCDCIncremental Computation
0 likes · 11 min read
Challenges and Approaches for Real‑Time Data Aggregation Analysis
Sohu Tech Products
Sohu Tech Products
Feb 17, 2021 · Big Data

Dynamic Data Partitioning in Apache Flink: A Fraud Detection Demo

This article explains how to implement dynamic data partitioning in Apache Flink using a fraud‑detection demo, covering the system architecture, rule‑driven runtime reconfiguration, custom ProcessFunction code, and the underlying key‑by logic that enables flexible, real‑time stream processing.

Apache FlinkDynamic PartitioningKeyBy
0 likes · 11 min read
Dynamic Data Partitioning in Apache Flink: A Fraud Detection Demo
DataFunTalk
DataFunTalk
Jan 28, 2021 · Big Data

Real-Time Financial Data Lake: Architecture, Practices, and Applications at Zhongyuan Bank

This talk by Ba Xueyu, a senior big data platform engineer at Zhongyuan Bank, outlines the background, architecture, and engineering practices of a real‑time financial data lake, highlighting its open, timely, and integrated design, streaming platform implementation, and use cases such as anti‑fraud and real‑time BI.

Flinkanti-fraudfinancial analytics
0 likes · 15 min read
Real-Time Financial Data Lake: Architecture, Practices, and Applications at Zhongyuan Bank
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 25, 2021 · Big Data

Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem

In 2020, Apache Flink surged to become the most active Apache project, releasing three major versions that advanced its unified stream‑batch engine, introduced cloud‑native K8s support, expanded AI capabilities with PyFlink, and fostered a thriving Chinese community, solidifying its role as the de‑facto standard for real‑time computing.

AI integrationApache FlinkBig Data
0 likes · 21 min read
Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem
DataFunTalk
DataFunTalk
Jan 5, 2021 · Big Data

Highlights of Flink Forward Asia 2020: Stream‑Batch Integration, AI Fusion, and Cloud‑Native Advances

The 2020 Flink Forward Asia conference showcased Apache Flink's rapid growth, community milestones, industry adoption, and technical breakthroughs such as unaligned checkpoints, approximate failover, the Nexmark benchmark, stream‑batch unification, AI integration via PyFlink and Alink, and deep cloud‑native support on Kubernetes, illustrated through case studies from Alibaba, Meituan, Kuaishou, and Dell.

AI integrationApache FlinkCloud Native
0 likes · 20 min read
Highlights of Flink Forward Asia 2020: Stream‑Batch Integration, AI Fusion, and Cloud‑Native Advances
DataFunTalk
DataFunTalk
Dec 11, 2020 · Big Data

My Journey and Contributions in the Apache Flink Community

The author shares his personal journey from first encountering Flink to becoming an Apache Flink Committer at ByteDance, detailing community involvement, code contributions, bug fixes, lessons learned, advice for newcomers, and concluding with promotional offers for Flink services.

Apache FlinkBlink PlannerSQL
0 likes · 12 min read
My Journey and Contributions in the Apache Flink Community
dbaplus Community
dbaplus Community
Oct 29, 2020 · Big Data

Inside Didi’s Real-Time Data Warehouse for Ride-Sharing: Architecture & Lessons

This article details Didi’s end‑to‑end construction of a real‑time data warehouse for the Ride‑Sharing (顺风车) business, covering motivations, layer‑by‑layer architecture, naming conventions, StreamSQL capabilities, operational tooling, achieved results, challenges, and future batch‑stream integration plans.

DidiFlinkreal-time data warehouse
0 likes · 21 min read
Inside Didi’s Real-Time Data Warehouse for Ride-Sharing: Architecture & Lessons
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 23, 2020 · Big Data

Overview of Real-Time Big Data Processing: Spark Structured Streaming, CarbonData, Flink, and Cloud Stream

This article provides a comprehensive overview of modern real‑time big‑data solutions, detailing Spark Structured Streaming capabilities, CarbonData’s storage architecture, Meituan’s Flink deployments, and Huawei Cloud Stream’s unified streaming service, highlighting their features, challenges, and future directions.

CarbonDataFlinkReal-time analytics
0 likes · 17 min read
Overview of Real-Time Big Data Processing: Spark Structured Streaming, CarbonData, Flink, and Cloud Stream
DataFunTalk
DataFunTalk
Oct 2, 2020 · Big Data

Single-Task Recovery in Flink: Design and Implementation for Real‑Time Stream Processing

This article describes ByteDance's single‑task recovery solution for Flink's real‑time computation, detailing the problem of global job restarts, the proposed network‑layer enhancements, upstream and downstream optimizations, JobManager restart strategy, implementation challenges, and the measurable latency and availability benefits achieved in production.

FlinkSingle-Task Recoveryfault tolerance
0 likes · 11 min read
Single-Task Recovery in Flink: Design and Implementation for Real‑Time Stream Processing
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 15, 2020 · Big Data

Designing Nexmark: A Standard Benchmark for Stream Processing Performance

This article examines the challenges of existing stream‑processing benchmarks, introduces the open‑source Nexmark framework designed for reproducible, comprehensive performance testing, describes its metrics, query set, workload configurability, and presents experimental results on Flink, highlighting its role in advancing big‑data stream benchmarking.

BenchmarkCPUFlink
0 likes · 14 min read
Designing Nexmark: A Standard Benchmark for Stream Processing Performance
DataFunTalk
DataFunTalk
Sep 13, 2020 · Big Data

Online Sample Generation with Flink: Architecture and Implementation

This article explains why Flink is chosen for online sample generation, describes the end‑to‑end implementation steps—including stream union, state‑timer processing, and output formatting—covers state backend choices, monitoring, validation, fault handling, and platformization for scalable real‑time machine‑learning pipelines.

FlinkKafkaOnline Sample Generation
0 likes · 11 min read
Online Sample Generation with Flink: Architecture and Implementation
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 12, 2020 · Big Data

Technical Architecture and Component Selection of a Real‑time Data Platform (RTDP)

This article details the technical architecture of a Real‑time Data Platform (RTDP), covering component selection such as DBus, Kafka, Wormhole, Moonbox and Davinci, and discusses design considerations, data management, security, operational practices, and various deployment modes for big‑data applications.

Big Data ArchitectureRTDPdata security
0 likes · 22 min read
Technical Architecture and Component Selection of a Real‑time Data Platform (RTDP)
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 12, 2020 · Big Data

Designing a Real‑time Data Platform for Modern Data Warehouses

This article explores the evolution from traditional to modern data warehouses, outlines the key capabilities of real‑time data platforms such as data real‑time, virtualization, democratization and collaboration, and presents a comprehensive architecture design with unified collection, streaming, compute and visualization layers, while discussing functional, quality, stability, cost, agility and management considerations.

architecturedata virtualizationreal-time data
0 likes · 18 min read
Designing a Real‑time Data Platform for Modern Data Warehouses
Didi Tech
Didi Tech
Aug 26, 2020 · Big Data

Real-time Data Warehouse Construction at Didi: Architecture, Practices, and Lessons

To support Didi’s fast‑growing car‑pool service, a real‑time data warehouse was built using a streamlined layered architecture—ODS, DWD, DIM, DWM, and APP—leveraging Flink‑based StreamSQL, Kafka, Druid and ClickHouse to deliver minute‑level analytics, dashboards, monitoring, and cross‑business interfaces while planning unified meta‑store integration.

Big Data ArchitectureData PlatformFlink
0 likes · 20 min read
Real-time Data Warehouse Construction at Didi: Architecture, Practices, and Lessons
IT Architects Alliance
IT Architects Alliance
Aug 12, 2020 · Big Data

Introduction to Confluent KSQL for Real-Time Stream Processing

This article introduces Confluent KSQL, a SQL‑based real‑time stream processing engine for Kafka, covering its architecture, stream vs table concepts, query lifecycle, Docker‑based setup, DDL commands, example joins, windowed aggregations, connectors, and its advantages and limitations.

Big DataDockerKSQL
0 likes · 9 min read
Introduction to Confluent KSQL for Real-Time Stream Processing
DataFunTalk
DataFunTalk
Aug 10, 2020 · Big Data

Understanding Flink SQL Architecture, Optimizations, and Internal Mechanisms

This article explains the evolution of Apache Flink's SQL support, detailing the Blink Planner architecture, the end‑to‑end Flink SQL workflow, logical and physical planning, code generation, stream‑specific optimizations such as retraction and mini‑batch, and future development directions.

Blink PlannerFlinkSQL
0 likes · 20 min read
Understanding Flink SQL Architecture, Optimizations, and Internal Mechanisms
Youzan Coder
Youzan Coder
Jul 15, 2020 · Big Data

Design and Implementation of Youzan ABTest System for Data‑Driven Growth

Youzan created an internal A/B testing platform—combining Java/Node SDKs, a real‑time data pipeline, and a metadata‑driven workflow—to enable data‑driven product iteration, granular traffic allocation, automated logging, statistical analysis, and scalable growth insights across its merchant services, while planning further automation and integration.

A/B testingBig DataExperiment Platform
0 likes · 19 min read
Design and Implementation of Youzan ABTest System for Data‑Driven Growth
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 13, 2020 · Big Data

What’s New in Apache Flink 1.11? A Deep Dive into Features and Performance

Apache Flink 1.11.0, released after four months of development, brings major ecosystem, usability, and stability improvements—including CDC support, a new JDBC catalog, real‑time Hive integration, a redesigned source API, PyFlink enhancements, application mode for Kubernetes, and checkpoint optimizations—while highlighting the growing contribution of Chinese developers.

Apache FlinkCheckpointFeature Release
0 likes · 20 min read
What’s New in Apache Flink 1.11? A Deep Dive into Features and Performance
DataFunTalk
DataFunTalk
Jul 10, 2020 · Big Data

Apache Flink Practice at NetEase: Architecture, Scale, and Future Directions

This article details NetEase's evolution from Storm to Flink for real‑time computing, describing the Sloth platform's architecture, large‑scale deployment, diverse business scenarios, monitoring, alerting, and future development plans, illustrating how Flink powers data synchronization, real‑time warehousing, and e‑commerce analytics and recommendation.

Data WarehouseFlinkNetEase
0 likes · 15 min read
Apache Flink Practice at NetEase: Architecture, Scale, and Future Directions
Big Data Technology Architecture
Big Data Technology Architecture
Jul 8, 2020 · Big Data

Apache Flink 1.11.0 Release: New Features and Optimizations

Apache Flink 1.11.0 introduces a suite of major enhancements—including unaligned checkpoints, a unified source interface, CDC support in Table API/SQL, performance‑boosted PyFlink, a new application deployment mode, and numerous UI, Docker, and catalog improvements—aimed at increasing usability, scalability, and integration across streaming and batch workloads.

FlinkSQLSource Interface
0 likes · 18 min read
Apache Flink 1.11.0 Release: New Features and Optimizations
Architect
Architect
Jun 11, 2020 · Big Data

Understanding Apache Flink Architecture, Data Transfer, Event‑Time Processing, State Management, and Checkpointing

This article explains Apache Flink's distributed system architecture—including JobManager, ResourceManager, TaskManager, and Dispatcher—covers session and job deployment modes, data transfer mechanisms, event‑time handling with watermarks, various state types and backends, scaling strategies, and the checkpoint/savepoint recovery process.

Apache FlinkBig DataEvent Time
0 likes · 15 min read
Understanding Apache Flink Architecture, Data Transfer, Event‑Time Processing, State Management, and Checkpointing
58 Tech
58 Tech
Jun 10, 2020 · Big Data

Real‑time Data Warehouse Practices at 58 Tongcheng Bao: From Spark Streaming 1.0 to Flink‑based 2.0

This article details the evolution of 58 Tongcheng Bao's real‑time data warehouse, describing the initial Spark‑Streaming architecture, its limitations, and the redesign using Flink with a layered ODS‑DWD‑DWS‑APP model, data‑quality monitoring, join techniques, and the resulting improvements in latency and accuracy.

Big DataData QualityFlink
0 likes · 9 min read
Real‑time Data Warehouse Practices at 58 Tongcheng Bao: From Spark Streaming 1.0 to Flink‑based 2.0
Big Data Technology Architecture
Big Data Technology Architecture
May 22, 2020 · Big Data

Apache Flink 1.11 New Features Overview

The article provides a comprehensive overview of Apache Flink 1.11, detailing enhancements in cluster deployment, resource management, source/sink APIs, state backends, Table & SQL improvements, DataStream extensions, PyFlink/ML support, and runtime optimizations, along with relevant code examples and references.

Apache FlinkFlink 1.11Table API
0 likes · 19 min read
Apache Flink 1.11 New Features Overview
Big Data Technology & Architecture
Big Data Technology & Architecture
May 18, 2020 · Big Data

Real‑time Data Platform (RTDP): Concepts, Architecture and Design Considerations

This article examines the design of a real‑time data platform, discussing its background concepts, modern data‑warehouse perspective, architectural layers, unified data‑collection, streaming, compute and visualization platforms, and the functional, quality, stability, cost and agility considerations required for building an end‑to‑end real‑time pipeline.

Data DemocratizationData Platformarchitecture
0 likes · 17 min read
Real‑time Data Platform (RTDP): Concepts, Architecture and Design Considerations
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 24, 2020 · Big Data

What’s New in Apache Flink 1.10? Deep Dive into Major Features and Enhancements

Apache Flink 1.10 introduces a major upgrade that merges the Blink engine, boosts performance and stability, adds native Kubernetes support, enhances SQL DDL, delivers production‑ready Hive batch compatibility, optimizes memory management, and expands Python UDF capabilities, with detailed feature breakdowns and code examples.

Apache FlinkBatch ProcessingKubernetes
0 likes · 8 min read
What’s New in Apache Flink 1.10? Deep Dive into Major Features and Enhancements
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 15, 2020 · Big Data

Understanding Event Time and Watermarks in Apache Flink

This article explains how Apache Flink uses event‑time timestamps and watermarks to handle out‑of‑order and late data, describes the assignTimestampsAndWatermarks API with periodic and punctuated watermark assigners, and provides practical code examples for window lateness and side‑output handling.

Apache FlinkEvent TimeFlink
0 likes · 10 min read
Understanding Event Time and Watermarks in Apache Flink
Big Data Technology Architecture
Big Data Technology Architecture
Feb 12, 2020 · Big Data

Apache Flink 1.10 Release: New Features, Optimizations, and Kubernetes Integration

Apache Flink 1.10 introduces major performance and stability improvements, unified memory configuration, native Kubernetes session mode, enhanced Table API/SQL with production‑ready Hive integration, expanded Python UDF support, and a host of important bug fixes and connector updates, marking the largest community‑driven release to date.

Apache FlinkHive IntegrationKubernetes
0 likes · 17 min read
Apache Flink 1.10 Release: New Features, Optimizations, and Kubernetes Integration
DataFunTalk
DataFunTalk
Feb 10, 2020 · Artificial Intelligence

Real‑Time Intelligent Anomaly Detection Platform at Ctrip: Integrating Flink and TensorFlow (Prophet)

The article describes Ctrip's Prophet platform, which combines Flink real‑time stream processing with TensorFlow deep‑learning models to provide intelligent, low‑latency anomaly detection, replacing traditional rule‑based alerts and addressing challenges such as holiday traffic and model scalability.

AIDeep LearningFlink
0 likes · 13 min read
Real‑Time Intelligent Anomaly Detection Platform at Ctrip: Integrating Flink and TensorFlow (Prophet)
DataFunTalk
DataFunTalk
Jan 22, 2020 · Big Data

Real-Time Data Engineering Practices for Alibaba 1688 Business

This article explains how Alibaba 1688 achieves real‑time recommendation, advertising, and product statistics through a robust middle‑platform foundation, streaming engines like Blink, data synchronization tools, and scalable storage, illustrating three concrete engineering cases and the end‑to‑end real‑time data service pipeline.

AlibabaFlinkstream processing
0 likes · 8 min read
Real-Time Data Engineering Practices for Alibaba 1688 Business
Qunar Tech Salon
Qunar Tech Salon
Dec 20, 2019 · Big Data

Understanding Flink Cluster Startup and Job Execution Process

This article explains the architecture of a Flink cluster, detailing the startup procedures for JobManager and TaskManager, the three deployment modes, and the end‑to‑end flow of a Flink job from client code through StreamGraph, JobGraph, ExecutionGraph to the physical execution on TaskManagers.

Big DataCluster ArchitectureFlink
0 likes · 10 min read
Understanding Flink Cluster Startup and Job Execution Process
vivo Internet Technology
vivo Internet Technology
Dec 18, 2019 · Big Data

Comprehensive Overview of Big Data Architecture, Lambda/Kappa Models, and End-to-End Data Platform Design

The article surveys modern big‑data architecture, contrasting Lambda and Kappa models, highlights common governance and integration pain points, and proposes an end‑to‑end platform featuring unified metadata, stream‑batch processing, one‑click ingestion, standardized modeling, intelligent query abstraction, and a comprehensive development IDE.

Big DataData PlatformETL
0 likes · 13 min read
Comprehensive Overview of Big Data Architecture, Lambda/Kappa Models, and End-to-End Data Platform Design
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 17, 2019 · Big Data

Understanding Flink Sliding Windows and Performance Optimizations

This article explains Flink's sliding window mechanism, shows how the WindowAssigner and WindowOperator work with code examples, analyzes the performance impact of fine‑grained sliding windows, and proposes a practical workaround using tumbling windows combined with external storage such as Redis for efficient PV/UV aggregation.

Big DataFlinkPerformance Optimization
0 likes · 8 min read
Understanding Flink Sliding Windows and Performance Optimizations
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 4, 2019 · Big Data

Comprehensive Flink Interview Guide: Core Concepts, Advanced Topics, and Source‑Code Insights

This article provides an in‑depth Flink interview guide covering the framework’s core concepts, advanced features such as fault‑tolerance, state management, and checkpointing, as well as detailed explanations of its architecture, APIs, partitioning strategies, and source‑code flow, complete with code examples.

Big DataDistributed SystemsFlink
0 likes · 29 min read
Comprehensive Flink Interview Guide: Core Concepts, Advanced Topics, and Source‑Code Insights
Xianyu Technology
Xianyu Technology
Dec 3, 2019 · Backend Development

Design and Implementation of Omega System's User Reach Center

The Omega system’s User Reach Center integrates a behavior‑collection hub, a Flink‑based CEP rule engine, and a plug‑in‑driven reach module that routes, filters, and dispatches actions via push, SMS or external calls, delivering sub‑second targeting, higher accuracy, reduced development effort, and plans for offline profiling and data‑loop closure.

BackendSystem Architecturestream processing
0 likes · 7 min read
Design and Implementation of Omega System's User Reach Center
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 1, 2019 · Big Data

Understanding Flink LatencyMarker: End-to-End Delay Measurement and Implementation Details

This article explains the background, source‑code analysis, and practical implementation of Flink's LatencyMarker feature for measuring end‑to‑end job latency, including metric exposure, configuration options, and code snippets illustrating how latency markers are emitted and processed within the streaming pipeline.

Big DataEnd-to-End LatencyFlink
0 likes · 6 min read
Understanding Flink LatencyMarker: End-to-End Delay Measurement and Implementation Details
Xianyu Technology
Xianyu Technology
Nov 28, 2019 · Big Data

Data‑Driven Seller Activity Enhancement on Xianyu

The Xianyu team built a data‑driven system that monitors seller online status and reply speed, uses Siddhi CEP to match behavior patterns, and orchestrates activities, tasks, and synchronization modules, boosting conversion by three percentage points and allowing new scenarios to launch without developer effort.

CEPactivity optimizatione‑commerce
0 likes · 8 min read
Data‑Driven Seller Activity Enhancement on Xianyu
Xianyu Technology
Xianyu Technology
Oct 24, 2019 · Backend Development

Design of a Real-Time Complex Event Processing System for Xianyu

The article details Xianyu’s real‑time complex event processing system, which abstracts diverse business scenarios—such as activity coupons, price‑drop alerts, rental suggestions, and promotional offers—into a rule‑driven pipeline comprising log collection, a Blink‑based DSL EPL engine, and configurable result delivery, enabling feature rollout in half a day with ~10‑second latency.

DSLstream processing
0 likes · 10 min read
Design of a Real-Time Complex Event Processing System for Xianyu
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 16, 2019 · Big Data

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

This guide provides a detailed overview of Apache Flink covering its core streaming engine, APIs (DataSet, DataStream, Table), architectural components, comparison with Spark Streaming, partitioning, parallelism, restart strategies, state backends, time semantics, watermarks, SQL processing, fault‑tolerance mechanisms, memory management, serialization, RPC framework, back‑pressure handling, operator chaining, and practical tips for interview preparation.

Apache FlinkBig DataDataflow
0 likes · 22 min read
Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics
dbaplus Community
dbaplus Community
Sep 10, 2019 · Big Data

Why Exactly‑Once Processing Is So Hard in Distributed Systems (And How to Tackle It)

This article explores the two toughest problems in distributed stream processing—exactly‑once message handling and ordering—by dissecting the underlying impossibility of perfect failure detectors, the liveness‑vs‑safety trade‑off, zombie processes, and the practical solutions employed by systems such as Flink, Kafka Streams, MillWheel, and Spark.

ConsensusDistributed SystemsExactly-Once
0 likes · 81 min read
Why Exactly‑Once Processing Is So Hard in Distributed Systems (And How to Tackle It)
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 26, 2019 · Big Data

Comprehensive Collection of Apache Flink Learning Resources

This article compiles a curated list of the most reliable and official Apache Flink learning materials—including beginner tutorials, source‑code walkthroughs, advanced topics, community articles, real‑world case studies, and downloadable resources—providing a one‑stop reference for developers and researchers interested in stream processing and big‑data analytics.

Apache FlinkBig DataResources
0 likes · 10 min read
Comprehensive Collection of Apache Flink Learning Resources
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 18, 2019 · Big Data

Flink Application Scenarios and Scale at Kuaishou

The article details how Kuaishou leverages Apache Flink for large‑scale stream processing, describing its application scenarios, cluster sizing, interval join optimization, RocksDB performance challenges, source throttling strategies, JobManager stability, frequent job failures, and platform‑wide improvements.

Big DataFlinkKuaishou
0 likes · 2 min read
Flink Application Scenarios and Scale at Kuaishou
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 11, 2019 · Big Data

Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations

This article examines Flink’s industrial‑scale network stack, detailing the credit‑based flow control introduced in version 1.5, the refactored task‑IO thread collaboration, and serialization optimizations that together improve throughput and latency for large‑scale stream processing workloads.

Big DataCredit-based Flow ControlFlink
0 likes · 12 min read
Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations
Big Data Technology Architecture
Big Data Technology Architecture
Aug 7, 2019 · Big Data

Why Choose Apache Flink for Real‑Time Stream Processing: Features and Lessons Learned

This article explains why the author chose Apache Flink for real‑time stream processing, highlighting its unique combination of high throughput, low latency, event‑time support, stateful computation, flexible windows, and fault tolerance, while also reflecting on the challenges of adopting a less‑documented technology.

Event TimeFlinkReal-Time
0 likes · 7 min read
Why Choose Apache Flink for Real‑Time Stream Processing: Features and Lessons Learned
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 2, 2019 · Big Data

Integrating Apache Flink with Apache Pulsar for Scalable Elastic Data Processing

This article explains how Apache Pulsar and Apache Flink can be combined to provide a unified, scalable, and fault‑tolerant data processing platform, covering Pulsar's architecture, its differences from other messaging systems, various integration patterns, and concrete code examples for stream and batch workloads.

Apache FlinkApache PulsarBig Data
0 likes · 13 min read
Integrating Apache Flink with Apache Pulsar for Scalable Elastic Data Processing
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 20, 2019 · Big Data

Comprehensive Guide to Flink SQL: Background, New Features, Programming Model, Operators, Functions, and a Practical NBA Scoring Leader Example

This article provides an in‑depth overview of Flink SQL, covering its origins, the latest 1.7.0 and 1.8.0 enhancements, the underlying programming model, common operators and built‑in functions, and a complete end‑to‑end example that analyzes NBA scoring‑leader data using Flink SQL.

Apache FlinkBig DataFlink SQL
0 likes · 27 min read
Comprehensive Guide to Flink SQL: Background, New Features, Programming Model, Operators, Functions, and a Practical NBA Scoring Leader Example
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 12, 2019 · Big Data

Comprehensive Guide to FlinkCEP: API Overview, Pattern Definitions, Quantifiers, Conditions, and Usage Examples

This article provides a detailed introduction to FlinkCEP, covering how to add the library, define simple and composite patterns, use quantifiers and conditions, handle skip strategies, time constraints, and select results, with complete Java and Scala code examples for complex event processing.

Big DataCEPFlink
0 likes · 27 min read
Comprehensive Guide to FlinkCEP: API Overview, Pattern Definitions, Quantifiers, Conditions, and Usage Examples
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Jun 4, 2019 · Big Data

Why Flink Outperforms Storm: Deep Dive into Stream Processing Performance

Based on data transmission and reliability metrics, this article compares Apache Storm and Apache Flink in stream processing, presenting benchmark designs, test environments, results for synthetic and Kafka data, and offers practical recommendations such as operator chaining, object reuse, and checkpoint strategies to maximize Flink performance.

Big DataFlinkPerformance Testing
0 likes · 13 min read
Why Flink Outperforms Storm: Deep Dive into Stream Processing Performance
360 Tech Engineering
360 Tech Engineering
Jun 3, 2019 · Big Data

Performance Comparison of Apache Storm and Apache Flink from Data Transmission and Reliability Perspectives

This article presents a detailed performance benchmark comparing Apache Storm and Apache Flink in stream processing, focusing on data transmission methods, reliability mechanisms, operator chaining, and both self‑generated and Kafka‑sourced workloads, and provides practical optimization recommendations based on the results.

Big DataData TransmissionFlink
0 likes · 10 min read
Performance Comparison of Apache Storm and Apache Flink from Data Transmission and Reliability Perspectives
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 2, 2019 · Big Data

Tencent's Oceanus Real-Time Stream Computing Platform and Flink Optimizations

The article presents Tencent's evolution of real‑time stream processing using Flink, the design of the Oceanus one‑stop visual platform, and a series of deep extensions and optimizations—including UI redesign, JobManager failover, checkpoint handling, enhanced windows, LocalKeyBy, idle detection, and log isolation—aimed at supporting petabyte‑scale data workloads.

Big DataFlinkOceanus
0 likes · 16 min read
Tencent's Oceanus Real-Time Stream Computing Platform and Flink Optimizations
Big Data Technology & Architecture
Big Data Technology & Architecture
May 29, 2019 · Cloud Native

Real-Time Computing Solutions with Flink and HBase: Architecture, Market Analysis, and Use Cases

The article presents Alibaba Cloud's real-time computing solution based on Flink and HBase, covering market competition, open‑source ecosystem, containerized architecture on Kubernetes, and typical applications such as online education video analysis, city‑brain traffic management, and fraud detection.

Big DataCloud NativeFlink
0 likes · 12 min read
Real-Time Computing Solutions with Flink and HBase: Architecture, Market Analysis, and Use Cases
Big Data Technology & Architecture
Big Data Technology & Architecture
May 19, 2019 · Big Data

Implementing End-to-End Exactly-Once Semantics in Apache Flink with Apache Kafka Using Two-Phase Commit Sink

This article explains how Apache Flink’s TwoPhaseCommitSinkFunction, introduced in version 1.4, enables end-to-end exactly-once semantics when integrated with Apache Kafka, detailing the checkpoint mechanism and the two-phase commit protocol that ensures reliable data processing.

Apache FlinkApache KafkaBig Data
0 likes · 4 min read
Implementing End-to-End Exactly-Once Semantics in Apache Flink with Apache Kafka Using Two-Phase Commit Sink
G7 EasyFlow Tech Circle
G7 EasyFlow Tech Circle
Apr 23, 2019 · Big Data

How We Scaled Fatigue Event Processing to 45K TPS with Apache Flink

By iteratively redesigning the fatigue‑event detection pipeline and leveraging Apache Flink’s stateful stream processing, the team reduced network overhead, cut resource usage to a third, and achieved a stable 45,000 TPS throughput on six containers with 20 GB memory, while outlining three optimization phases and practical lessons.

Apache FlinkFatigue DetectionIoT
0 likes · 13 min read
How We Scaled Fatigue Event Processing to 45K TPS with Apache Flink
JD Retail Technology
JD Retail Technology
Apr 18, 2019 · Big Data

Data Heterogeneity with BinLake, Binlog, and Flink: Approaches for Order, Subscription, and Product Data

The article explains how data heterogeneity is achieved using JD's BinLake to capture MySQL binlogs, with Flink handling sequential and parallel consumption for order, subscription, and product data, discussing challenges such as ordering guarantees, idempotency, IO overhead, and the shift toward stream‑processing architectures.

BinlogElasticsearchFlink
0 likes · 5 min read
Data Heterogeneity with BinLake, Binlog, and Flink: Approaches for Order, Subscription, and Product Data
ITPUB
ITPUB
Apr 4, 2019 · Big Data

Achieving Sub‑Second Real‑Time Product Selection with Xianyu’s Mach and Blink

Xianyu’s Mach system tackles the e‑commerce challenge of instantly selecting high‑quality items from billions of products by leveraging Blink’s low‑latency stream computing, detailing its architecture—including state, windows, custom UDX functions, data merging, rule execution, and SQL‑to‑MVEL conversion—to achieve sub‑second processing at massive scale.

FlinkReal-Timeblink
0 likes · 18 min read
Achieving Sub‑Second Real‑Time Product Selection with Xianyu’s Mach and Blink
Youzan Coder
Youzan Coder
Mar 20, 2019 · Big Data

Evolution of Real-Time Computing at Youzan: From Storm to Flink and Future Directions

Youzan’s real‑time computing platform progressed from early Storm deployments through Spark Streaming to a Flink‑based architecture, adding unified task management, monitoring, and dedicated streaming clusters, while now pursuing SQL‑driven jobs, a Druid OLAP engine, and a future real‑time data warehouse.

Big DataFlinkSpark Streaming
0 likes · 14 min read
Evolution of Real-Time Computing at Youzan: From Storm to Flink and Future Directions
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 12, 2019 · Big Data

Understanding Apache Flink’s Core Design: “Batch Is a Special Case of Stream” and Its Architecture

This article explains Apache Flink’s fundamental design principle that treats batch as a special case of stream, compares native streaming with micro‑batching, describes its deployment modes, fault‑tolerance mechanisms, unified data and scheduling layers, and outlines Alibaba’s architectural optimizations for the platform.

Apache FlinkBatch Processingnative streaming
0 likes · 15 min read
Understanding Apache Flink’s Core Design: “Batch Is a Special Case of Stream” and Its Architecture
DataFunTalk
DataFunTalk
Jan 30, 2019 · Artificial Intelligence

Real‑Time Metrics Processing Technology for Financial Risk Control and Anti‑Fraud

This article outlines the challenges of financial risk control in the internet era and presents a comprehensive real‑time metrics processing system, covering data leakage, fraud, big‑data opportunities, AI model deployment, and the technical architecture of the Bangsheng real‑time indicator platform.

AIBig Dataanti‑fraud
0 likes · 17 min read
Real‑Time Metrics Processing Technology for Financial Risk Control and Anti‑Fraud
NetEase Game Operations Platform
NetEase Game Operations Platform
Jan 25, 2019 · Big Data

Understanding Exactly-Once Semantics in Apache Flink: Challenges and Implementation

This article analyzes the difficulties of achieving exactly-once delivery in Apache Flink, explains the distinction between state and end‑to‑end semantics, and details how idempotent and transactional sinks—illustrated with the Bucketing File Sink—realize exactly‑once guarantees through checkpoint‑based two‑phase commit.

Big DataExactly-OnceFlink
0 likes · 13 min read
Understanding Exactly-Once Semantics in Apache Flink: Challenges and Implementation
Ctrip Technology
Ctrip Technology
Dec 26, 2018 · Operations

Evolution of Ctrip's Hickwall Monitoring and Alerting Platform: Architecture, InfluxDB Cluster, Data Aggregation, and Stream Alerting

This article details the architectural evolution of Ctrip's Hickwall monitoring and alerting platform, describing the transition from an Elasticsearch‑based first generation to an InfluxDB‑driven second generation, the design of the Incluster storage layer, data aggregation strategies, and the implementation of high‑performance stream‑based alerting.

AlertingInfluxDBarchitecture
0 likes · 12 min read
Evolution of Ctrip's Hickwall Monitoring and Alerting Platform: Architecture, InfluxDB Cluster, Data Aggregation, and Stream Alerting
21CTO
21CTO
Nov 7, 2018 · Big Data

Why Data Streams Are the Backbone of Real-Time Big Data Analytics

Data streams, akin to endless rivers, enable continuous, real-time processing of diverse sources such as IoT telemetry, web logs, and e-commerce events, offering advantages over batch processing, while presenting challenges like scalability and fault tolerance, and are supported by tools like Kinesis, Kafka, Flink, and Storm.

Amazon KinesisApache KafkaBig Data
0 likes · 6 min read
Why Data Streams Are the Backbone of Real-Time Big Data Analytics
dbaplus Community
dbaplus Community
Aug 8, 2018 · Big Data

How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns

This article explains the architecture of a Real‑Time Data Platform (RTDP), details the technical selection of core components such as DBus, Kafka, Wormhole, Moonbox and Davinci, and discusses data management, security, operations, and four deployment modes—synchronization, flow, rotation and intelligent—illustrating how each fits different business scenarios.

Big Data ArchitectureData IntegrationKafka
0 likes · 24 min read
How to Build a Real‑Time Data Platform: Tech Stack & Design Patterns
JD Tech Talk
JD Tech Talk
Aug 2, 2018 · Big Data

Real-Time Order Statistics with Apache Flink in a Data Aggregation Platform

This article explains how the data aggregation platform adopts Apache Flink for high‑throughput, low‑latency stream processing, covering the complete workflow from data source integration, transformation operations, windowing and time concepts, to a concrete order‑count example with custom aggregation logic.

Apache FlinkEvent TimeFlink
0 likes · 10 min read
Real-Time Order Statistics with Apache Flink in a Data Aggregation Platform
Xianyu Technology
Xianyu Technology
Jul 28, 2018 · Big Data

Real-Time Computation Architecture for Non-Timeline Feed Ranking

The paper presents a real‑time computation architecture on Alibaba Cloud Blink that scores and ranks non‑timeline feed items within a sliding 72‑hour window, updating rankings every few minutes, using Redis ZSET for fast retrieval, and discusses scaling optimizations such as interval tuning and external join‑and‑rank services.

Big DataReal‑Time Computingfeed ranking
0 likes · 6 min read
Real-Time Computation Architecture for Non-Timeline Feed Ranking
Ctrip Technology
Ctrip Technology
Jul 17, 2018 · Big Data

Meteor: A Real-Time Computation Platform Based on Storm for Ctrip Marketing

The article introduces Meteor, a Storm‑based real‑time computation platform developed by Ctrip Marketing to simplify topology management, automate deployment, and improve resource efficiency for complex marketing scenarios, highlighting its architecture, features, and measurable business impact.

Real‑Time ComputingStormmarketing platform
0 likes · 10 min read
Meteor: A Real-Time Computation Platform Based on Storm for Ctrip Marketing