Tagged articles
946 articles
Page 9 of 10
Efficient Ops
Efficient Ops
Dec 17, 2019 · Operations

How Alibaba Scales Flink: Lessons in Big Data Operations

This article details Alibaba's massive Flink deployment, covering its historical background, the operational challenges of managing tens of thousands of nodes, the design of a comprehensive Flink management platform, and the automated solutions for fault handling, resource allocation, and performance testing in a large‑scale big‑data environment.

Big Data OperationsCluster ManagementFlink
0 likes · 20 min read
How Alibaba Scales Flink: Lessons in Big Data Operations
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 17, 2019 · Big Data

Understanding Flink Sliding Windows and Performance Optimizations

This article explains Flink's sliding window mechanism, shows how the WindowAssigner and WindowOperator work with code examples, analyzes the performance impact of fine‑grained sliding windows, and proposes a practical workaround using tumbling windows combined with external storage such as Redis for efficient PV/UV aggregation.

Big DataFlinkSliding Window
0 likes · 8 min read
Understanding Flink Sliding Windows and Performance Optimizations
Java Captain
Java Captain
Dec 17, 2019 · Backend Development

Top 10 Most Popular Java Open‑Source Projects on GitHub in November

This article lists and briefly describes the ten most starred Java open‑source projects on GitHub for November, covering tools such as NLP libraries, learning guides, big‑data frameworks, rapid‑development platforms, algorithm collections, job schedulers, code‑style checkers, traffic‑control systems, the Spring framework, and service‑discovery solutions.

FlinkGitHubframeworks
0 likes · 5 min read
Top 10 Most Popular Java Open‑Source Projects on GitHub in November
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 10, 2019 · Big Data

Implementing Real-Time TopN Rankings with Apache Flink

This article explains how to develop a real‑time TopN ranking feature using Apache Flink, covering both global and grouped TopN implementations, nested TopN strategies, and provides complete Java code snippets for environment setup, word counting, windowing, and custom TopN functions.

FlinkReal-TimeStreaming
0 likes · 8 min read
Implementing Real-Time TopN Rankings with Apache Flink
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 5, 2019 · Artificial Intelligence

How Alibaba’s Alink Empowers Real‑Time Machine Learning on Flink

Alink, Alibaba’s open‑source machine‑learning platform built on Apache Flink, offers a rich library of batch and streaming algorithms, a Python API, iterative computation optimizations, and real‑world case studies, positioning it as a powerful AI solution for large‑scale, low‑latency data processing.

AIAlinkFlink
0 likes · 13 min read
How Alibaba’s Alink Empowers Real‑Time Machine Learning on Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 4, 2019 · Big Data

Comprehensive Flink Interview Guide: Core Concepts, Advanced Topics, and Source‑Code Insights

This article provides an in‑depth Flink interview guide covering the framework’s core concepts, advanced features such as fault‑tolerance, state management, and checkpointing, as well as detailed explanations of its architecture, APIs, partitioning strategies, and source‑code flow, complete with code examples.

Big DataDistributed SystemsFlink
0 likes · 29 min read
Comprehensive Flink Interview Guide: Core Concepts, Advanced Topics, and Source‑Code Insights
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 1, 2019 · Big Data

Understanding Flink LatencyMarker: End-to-End Delay Measurement and Implementation Details

This article explains the background, source‑code analysis, and practical implementation of Flink's LatencyMarker feature for measuring end‑to‑end job latency, including metric exposure, configuration options, and code snippets illustrating how latency markers are emitted and processed within the streaming pipeline.

Big DataEnd-to-End LatencyFlink
0 likes · 6 min read
Understanding Flink LatencyMarker: End-to-End Delay Measurement and Implementation Details
21CTO
21CTO
Nov 27, 2019 · Big Data

How Xiaohongshu Scales Real‑Time Personalized Recommendations with Flink

The article summarizes Guo Yi’s 2019 Alibaba Cloud conference talk, outlining Xiaohongshu’s personalized recommendation architecture, detailing the data stack from ingestion to warehouse, and showcasing a Flink‑based real‑time multi‑dimensional user behavior aggregation use case, followed by a vision for the next year’s data architecture evolution.

Data ArchitectureFlinkReal-time Streaming
0 likes · 3 min read
How Xiaohongshu Scales Real‑Time Personalized Recommendations with Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 26, 2019 · Big Data

Understanding Flink SQL Window Functions: Types, Implementation, and Emit Triggers

This article provides a comprehensive overview of Flink SQL window functions, detailing time‑based window types, their underlying implementation in the StreamExecGroupWindowAggregate operator, the processing flow of WindowOperator, timer handling, emit/trigger strategies, and practical code examples for Tumble, Hop, and Session windows.

Big DataEmitFlink
0 likes · 20 min read
Understanding Flink SQL Window Functions: Types, Implementation, and Emit Triggers
DataFunTalk
DataFunTalk
Nov 21, 2019 · Big Data

Evolution of 58.com Real-Time Computing Platform and the One-Stop Streaming Data Processing System Wstream

The article details the technical evolution of 58.com’s real-time computing platform—from Storm and Spark Streaming to a Flink‑based one‑stop solution called Wstream—covering use cases, architecture, stability measures, migration from Storm, operational diagnostics, and future development plans.

Big DataFlinkReal-time Streaming
0 likes · 11 min read
Evolution of 58.com Real-Time Computing Platform and the One-Stop Streaming Data Processing System Wstream
G7 EasyFlow Tech Circle
G7 EasyFlow Tech Circle
Nov 21, 2019 · Big Data

How G7 Combines AI, Big Data, and IoT to Transform Logistics

This article presents a detailed overview of G7's AI‑plus‑Big‑Data‑plus‑IoT platform for logistics, describing its neutral open architecture, real‑time data pipelines using Kafka and Flink, Lambda‑style storage in HBase/Hive, and the resulting safety‑insurance and analytics capabilities.

AIFlinkIoT
0 likes · 10 min read
How G7 Combines AI, Big Data, and IoT to Transform Logistics
Xianyu Technology
Xianyu Technology
Nov 21, 2019 · Big Data

Event-Driven Rule Engine for User Growth at Xianyu

To accelerate growth on Xianyu’s 20 million‑DAU platform, the team built an event‑driven rule engine with a SQL‑like DSL that translates user‑behavior streams into real‑time Flink/Blink queries, cutting rule development from four days to half a day and achieving sub‑5‑second processing latency.

Big DataDSLEvent Stream
0 likes · 9 min read
Event-Driven Rule Engine for User Growth at Xianyu
58 Tech
58 Tech
Nov 20, 2019 · Big Data

Evolution of 58.com Real-Time Computing Platform and the One‑Stop Streaming Platform Wstream Built on Flink

This article details the technical evolution of 58.com’s real‑time computing platform, describing the shift from Storm and Spark Streaming to Apache Flink, the design of the one‑stop Wstream platform, its large‑scale deployment, stability measures, SQL streaming capabilities, task migration, diagnostics, optimizations, and future plans.

FlinkReal-time StreamingTask Migration
0 likes · 11 min read
Evolution of 58.com Real-Time Computing Platform and the One‑Stop Streaming Platform Wstream Built on Flink
DataFunTalk
DataFunTalk
Nov 19, 2019 · Big Data

Comprehensive Overview of Data Warehouses: Concepts, Evolution, Architecture, and Real‑time vs Offline Practices

This article provides a thorough introduction to data warehouses, traces their evolution, explains construction methodologies, compares offline, Lambda, and Kappa architectures, and presents real‑time warehouse case studies from Alibaba, Meituan, Xiaomi, Netflix, and OPPO, highlighting practical implementation details and challenges.

ETLFlinkKappa architecture
0 likes · 14 min read
Comprehensive Overview of Data Warehouses: Concepts, Evolution, Architecture, and Real‑time vs Offline Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 18, 2019 · Big Data

Understanding JVM Garbage Collection and Flink Memory Management

This article explains the fundamentals of JVM garbage collection, its generational algorithms and associated performance issues, and then details Apache Flink's memory management architecture, including MemorySegment, off‑heap buffers, serialization mechanisms, and type information for efficient big‑data processing.

Big DataFlinkGarbage Collection
0 likes · 7 min read
Understanding JVM Garbage Collection and Flink Memory Management
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 14, 2019 · Big Data

Comparison of Flink and Spark Structured Streaming: Joins, State Management, Fault Tolerance, and Backpressure

This article compares Flink and Spark Structured Streaming, detailing their differences in join capabilities, state management, fault‑tolerance mechanisms, exactly‑once semantics, back‑pressure handling, and table registration, while providing code examples and practical insights for real‑time big‑data processing.

Big DataFlinkJOIN
0 likes · 13 min read
Comparison of Flink and Spark Structured Streaming: Joins, State Management, Fault Tolerance, and Backpressure
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 9, 2019 · Big Data

Comparative Study of Apache Flink and Spark Streaming at Xiaomi: Architecture, Performance, and Serialization

This article examines Xiaomi's migration from Spark Streaming to Apache Flink, comparing scheduling strategies, mini‑batch versus true streaming, resource utilization, latency, and serialization mechanisms, and concludes with practical insights and custom optimization techniques for large‑scale data processing.

Big DataFlinkMini-Batch
0 likes · 17 min read
Comparative Study of Apache Flink and Spark Streaming at Xiaomi: Architecture, Performance, and Serialization
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 7, 2019 · Big Data

Real‑time Dashboard with Flink: Streaming Order Data, Site Metrics, and Top‑N Merchandise Rankings

This article demonstrates how to build a one‑second‑refresh real‑time dashboard for e‑commerce order data using Apache Flink, Kafka, and Redis, covering JSON message parsing, processing‑time windows, stateful aggregation for site‑level KPIs, and efficient top‑N product ranking via Redis sorted sets.

DashboardFlinkKafka
0 likes · 11 min read
Real‑time Dashboard with Flink: Streaming Order Data, Site Metrics, and Top‑N Merchandise Rankings
DataFunTalk
DataFunTalk
Nov 7, 2019 · Big Data

Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans

This article details Beike's real‑time computing engine, covering its background, streaming platform built on Spark Streaming and Flink, data ingestion via Kafka, metadata handling, SQL‑based task development, monitoring, storage solutions, and future roadmap for resource management and AI‑enhanced monitoring.

Big DataFlinkKafka
0 likes · 14 min read
Real-Time Computing Engine at Beike: Architecture, Practices, and Future Plans
dbaplus Community
dbaplus Community
Oct 22, 2019 · Big Data

How Weibo Built a Billion‑Log Real‑Time Data Platform with Flink

This article details how Weibo’s advertising team designed and implemented a real‑time data platform capable of processing over a hundred billion daily logs, covering technology selection, Flink advantages, architecture evolution, data processing pipelines, component libraries, fault‑tolerance strategies, and the construction of a multi‑layer real‑time data warehouse.

Big DataCheckpointData Architecture
0 likes · 25 min read
How Weibo Built a Billion‑Log Real‑Time Data Platform with Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 22, 2019 · Big Data

Real-Time Data Verification: Building a Log Comparison Solution with Flink, Elasticsearch, and Hive

This article explains how to design and implement a real‑time data verification framework using Flink to generate wide tables, storing detailed records in Elasticsearch or HDFS with Hive for cross‑checking against offline data, ensuring trustworthy metrics for dashboards and stakeholders.

Big DataData verificationElasticsearch
0 likes · 7 min read
Real-Time Data Verification: Building a Log Comparison Solution with Flink, Elasticsearch, and Hive
Xianyu Technology
Xianyu Technology
Oct 16, 2019 · Big Data

Xianyu's Complex Event Processing (CEP) System Design and Implementation

Xianyu’s Complex Event Processing system, built on Alibaba’s Blink (Flink) and a custom SQL‑like DSL, standardizes event I/O, lets users define sequence, window and aggregation rules, and combines an interactive rule service, SLS source, parser, job manager and MetaQ sink to achieve ~100 k QPS, sub‑second latency, fault‑tolerant, and rule‑to‑production turnaround in about thirty minutes.

CEPDSLFlink
0 likes · 9 min read
Xianyu's Complex Event Processing (CEP) System Design and Implementation
JD Retail Technology
JD Retail Technology
Oct 14, 2019 · Databases

Overview of JDNoSQL Platform and Its Real-Time Advertising Use Cases

The article introduces JDNoSQL, a distributed column‑oriented key‑value store built on HDFS, outlines its core features, describes various business scenarios including real‑time ad computation, details the system architecture with Kafka and Flink, and presents table designs for ad impression and click statistics.

Big DataFlinkKafka
0 likes · 13 min read
Overview of JDNoSQL Platform and Its Real-Time Advertising Use Cases
UCloud Tech
UCloud Tech
Oct 11, 2019 · Big Data

Real‑Time Student Performance Analytics with Flink and Spark

This article demonstrates how to build a real‑time education analytics system by streaming answer data through Kafka into Flink or Spark, performing per‑question, per‑grade, and per‑subject aggregations, and optionally accelerating development with UFlink SQL.

Education AnalyticsFlinkKafka
0 likes · 17 min read
Real‑Time Student Performance Analytics with Flink and Spark
58 Tech
58 Tech
Oct 10, 2019 · Big Data

Optimizing Real‑Time Feature Extraction at 58.com: Migrating from Spark Streaming to Flink

This article describes how 58.com’s commercial engineering team redesigned its real‑time feature‑mining pipeline—replacing a minute‑level Spark Streaming framework with Flink—to achieve sub‑second latency, higher throughput, stronger fault‑tolerance, and end‑to‑end exactly‑once semantics for user‑profile generation in the second‑hand‑car recommendation scenario.

Big DataExactly-OnceFlink
0 likes · 14 min read
Optimizing Real‑Time Feature Extraction at 58.com: Migrating from Spark Streaming to Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 9, 2019 · Big Data

Choosing and Using Flink State Backends: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend

This article explains how Flink checkpoints persist state, compares the three built‑in state backends (MemoryStateBackend, FsStateBackend, RocksDBStateBackend), discusses their configurations, advantages, limitations, and provides guidance on selecting the appropriate backend for different big‑data streaming scenarios.

Big DataCheckpointFlink
0 likes · 10 min read
Choosing and Using Flink State Backends: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend
HomeTech
HomeTech
Oct 9, 2019 · Big Data

Design and Implementation of a Flink‑Based Real‑Time Data Platform at Autohome

This article describes how Autohome migrated its real‑time analytics from Storm to a Flink‑SQL platform, detailing the architectural design, development and operational advantages, practical use cases such as recommendation metrics, and future plans for ecosystem expansion and open‑source release.

FlinkReal-time Streamingdata-warehouse
0 likes · 12 min read
Design and Implementation of a Flink‑Based Real‑Time Data Platform at Autohome
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 21, 2019 · Big Data

Deploying Apache Flink on Kubernetes: A Step‑by‑Step Guide

This tutorial explains how to run Apache Flink jobs on Kubernetes by building Docker images, deploying JobManager and TaskManager components with Kubernetes manifests, configuring high‑availability with ZooKeeper and HDFS, and using SavePoints and scaling techniques to manage and extend Flink streaming applications.

Big DataDockerFlink
0 likes · 14 min read
Deploying Apache Flink on Kubernetes: A Step‑by‑Step Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 16, 2019 · Big Data

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

This guide provides a detailed overview of Apache Flink covering its core streaming engine, APIs (DataSet, DataStream, Table), architectural components, comparison with Spark Streaming, partitioning, parallelism, restart strategies, state backends, time semantics, watermarks, SQL processing, fault‑tolerance mechanisms, memory management, serialization, RPC framework, back‑pressure handling, operator chaining, and practical tips for interview preparation.

Apache FlinkBig DataDataflow
0 likes · 22 min read
Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 11, 2019 · Big Data

Evolution of Zhihu's Real-Time Data Warehouse: From Spark Streaming 1.0 to Flink‑Based 2.0

This article details Zhihu's real‑time data warehouse evolution, describing the 1.0 Spark Streaming architecture, its limitations, and the 2.0 redesign that introduces Flink, layered data models, streaming and batch ETL, metric storage choices, and future roadmap for scalable, low‑latency analytics.

FlinkLambda architectureSpark Streaming
0 likes · 19 min read
Evolution of Zhihu's Real-Time Data Warehouse: From Spark Streaming 1.0 to Flink‑Based 2.0
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 6, 2019 · Big Data

Big Data Development Interview Guide and Skill Tree Overview

This article provides a comprehensive interview roadmap for big data developers, outlining essential Java fundamentals, JVM internals, Linux basics, distributed theory, core frameworks such as Hadoop, Spark, Flink, Kafka, Netty, HBase, Hive, and practical algorithm topics, while also offering resume and career advice for aspiring candidates.

FlinkHadoopKafka
0 likes · 15 min read
Big Data Development Interview Guide and Skill Tree Overview
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Sep 6, 2019 · Big Data

Real-Time Data Architecture, Evolution, and Applications at an Online School

The article details the six‑layer big‑data architecture of an online school, chronicles its migration from Storm to Spark Streaming and finally to Flink, and showcases concrete real‑time applications such as gateway monitoring, user‑profile tagging, renewal reporting, and advertising analysis, while outlining future development directions.

AnalyticsBig Data ArchitectureFlink
0 likes · 14 min read
Real-Time Data Architecture, Evolution, and Applications at an Online School
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Sep 3, 2019 · Big Data

Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong

This article details the design, implementation, and optimization of a Flink‑based real‑time computing platform at Tongcheng‑Elong, covering the evolution from Storm to Flink, support for FlinkSQL and FlinkStream, metric collection, logging, data lineage, savepoint management, and numerous stability fixes contributed back to the open‑source community.

Big DataData LineageFlink
0 likes · 16 min read
Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong
Tencent Cloud Developer
Tencent Cloud Developer
Aug 30, 2019 · Big Data

How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing

The cloud+ community and Kuaishou hosted a big‑data technology salon where experts detailed the evolution, architecture, and practical deployments of Spark‑based cloud data warehouses, ElasticSearch, Yarn, and Flink, highlighting trends, optimization techniques, and future directions for enterprise data analytics.

Big DataElasticsearchFlink
0 likes · 22 min read
How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing
dbaplus Community
dbaplus Community
Aug 27, 2019 · Big Data

How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming

This article explains how eBay’s Sherlock.IO monitoring platform processes billions of logs, events, and metrics daily using Flink Streaming jobs, detailing a metadata‑driven architecture, shared job strategies, Heartbeat‑based monitoring, job isolation, back‑pressure handling, and real‑world use cases such as Event Alerting, Eventzon, and Netmon.

Big DataFlinkReal-time Processing
0 likes · 18 min read
How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 25, 2019 · Big Data

Tencent Oceanus: Evolution, Productization, and Optimizations of Real‑Time Stream Computing with Flink

This article recounts Tencent's journey from adopting Flink to building the Oceanus platform, detailing its architecture, product features, and a series of deep extensions—including UI redesign, JobManager failover, checkpoint handling, enhanced windows, LocalKeyBy, watermark idle detection, and log isolation—aimed at supporting trillion‑scale real‑time data processing.

Big DataFlinkOceanus
0 likes · 18 min read
Tencent Oceanus: Evolution, Productization, and Optimizations of Real‑Time Stream Computing with Flink
Youzan Coder
Youzan Coder
Aug 23, 2019 · Big Data

How to Build a Robust Event Logging Quality System with Real‑Time Validation

This article outlines common event‑logging quality problems, a systematic registration and real‑time validation framework built on Flink, configurable rule syntax, explainable results, continuous monitoring, targeted optimizations, and an evaluation model that together form a comprehensive quality‑center for big‑data platforms.

Big DataData QualityFlink
0 likes · 11 min read
How to Build a Robust Event Logging Quality System with Real‑Time Validation
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 18, 2019 · Big Data

Flink Application Scenarios and Scale at Kuaishou

The article details how Kuaishou leverages Apache Flink for large‑scale stream processing, describing its application scenarios, cluster sizing, interval join optimization, RocksDB performance challenges, source throttling strategies, JobManager stability, frequent job failures, and platform‑wide improvements.

Big DataFlinkKuaishou
0 likes · 2 min read
Flink Application Scenarios and Scale at Kuaishou
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 11, 2019 · Big Data

Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations

This article examines Flink’s industrial‑scale network stack, detailing the credit‑based flow control introduced in version 1.5, the refactored task‑IO thread collaboration, and serialization optimizations that together improve throughput and latency for large‑scale stream processing workloads.

Big DataCredit-based Flow ControlFlink
0 likes · 12 min read
Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 7, 2019 · Big Data

Dynamic Variable Loading in Real-Time Stream Processing: Spark Streaming vs Flink Broadcast Mechanisms

Real-time streaming jobs require dynamic configuration loading without restarts, and this article compares two common approaches—polling pull and push control streams—examining Spark Streaming’s broadcast variables and Flink’s broadcast state, discussing their implementations, advantages, limitations, and practical considerations.

Broadcast VariableDynamic ConfigurationFlink
0 likes · 10 min read
Dynamic Variable Loading in Real-Time Stream Processing: Spark Streaming vs Flink Broadcast Mechanisms
Big Data Technology Architecture
Big Data Technology Architecture
Aug 7, 2019 · Big Data

Why Choose Apache Flink for Real‑Time Stream Processing: Features and Lessons Learned

This article explains why the author chose Apache Flink for real‑time stream processing, highlighting its unique combination of high throughput, low latency, event‑time support, stateful computation, flexible windows, and fault tolerance, while also reflecting on the challenges of adopting a less‑documented technology.

Event TimeFlinkReal-Time
0 likes · 7 min read
Why Choose Apache Flink for Real‑Time Stream Processing: Features and Lessons Learned
dbaplus Community
dbaplus Community
Jul 30, 2019 · Big Data

Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?

With the surge in real‑time data from sensors and devices, choosing the right streaming engine is critical; this article compares Apache Spark and Apache Flink—examining their architectures, micro‑batch vs continuous processing, strengths, limitations, and use‑case suitability for Kafka‑driven pipelines.

Big DataFlinkKafka
0 likes · 14 min read
Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 30, 2019 · Big Data

Curated Collection of Big Data, Flink, Hadoop and Real‑Time Computing Articles from the “Big Data Technology and Architecture” Series

This article presents a carefully organized catalogue of over a hundred technical posts covering Flink source‑code analysis, fundamental and advanced big‑data structures, Hadoop ecosystem components, real‑time streaming with Spark and Kafka, as well as system design guidelines and miscellaneous insights, each linked to its original publication for easy reference.

Big DataDistributed SystemsFlink
0 likes · 6 min read
Curated Collection of Big Data, Flink, Hadoop and Real‑Time Computing Articles from the “Big Data Technology and Architecture” Series
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 22, 2019 · Backend Development

Understanding Back Pressure in Flink and Its Implementation

The article explains what back pressure is in Flink streaming jobs, why it occurs when data generation outpaces downstream consumption, how Flink monitors it via stack‑trace sampling, configurable parameters, Web UI visualization, and compares the approach with Spark Streaming's back pressure mechanism.

FlinkSparkdata pipelines
0 likes · 5 min read
Understanding Back Pressure in Flink and Its Implementation
Xianyu Technology
Xianyu Technology
Jun 20, 2019 · Big Data

Design of a High-Performance Real-Time Data Processing System for Service Diagnosis

The paper presents a high‑performance real‑time data processing pipeline that collects, transports, preprocesses, and computes service logs and metrics using Alibaba Logtail, LogHub, and an enhanced Flink (Blink) engine, persisting root‑cause graphs in Lindorm, achieving sub‑3‑second latency for tens of millions of events per second and cutting diagnosis time to about five seconds.

FlinkReal-time Streamingarchitecture
0 likes · 10 min read
Design of a High-Performance Real-Time Data Processing System for Service Diagnosis
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 18, 2019 · Big Data

Understanding Watermarks, Event Time, and Processing Time in Apache Flink

This article explains the three time concepts in Flink—Process Time, Event Time, and Ingestion Time—illustrates their impact on windowed computations with examples, introduces watermarks and allowed lateness for handling out‑of‑order data, and provides complete Scala code for both processing‑time and event‑time streaming applications.

EventTimeFlinkScala
0 likes · 13 min read
Understanding Watermarks, Event Time, and Processing Time in Apache Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 12, 2019 · Big Data

Comprehensive Guide to FlinkCEP: API Overview, Pattern Definitions, Quantifiers, Conditions, and Usage Examples

This article provides a detailed introduction to FlinkCEP, covering how to add the library, define simple and composite patterns, use quantifiers and conditions, handle skip strategies, time constraints, and select results, with complete Java and Scala code examples for complex event processing.

Big DataCEPFlink
0 likes · 27 min read
Comprehensive Guide to FlinkCEP: API Overview, Pattern Definitions, Quantifiers, Conditions, and Usage Examples
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Jun 4, 2019 · Big Data

Why Flink Outperforms Storm: Deep Dive into Stream Processing Performance

Based on data transmission and reliability metrics, this article compares Apache Storm and Apache Flink in stream processing, presenting benchmark designs, test environments, results for synthetic and Kafka data, and offers practical recommendations such as operator chaining, object reuse, and checkpoint strategies to maximize Flink performance.

Big DataFlinkPerformance Testing
0 likes · 13 min read
Why Flink Outperforms Storm: Deep Dive into Stream Processing Performance
360 Tech Engineering
360 Tech Engineering
Jun 3, 2019 · Big Data

Performance Comparison of Apache Storm and Apache Flink from Data Transmission and Reliability Perspectives

This article presents a detailed performance benchmark comparing Apache Storm and Apache Flink in stream processing, focusing on data transmission methods, reliability mechanisms, operator chaining, and both self‑generated and Kafka‑sourced workloads, and provides practical optimization recommendations based on the results.

Big DataData TransmissionFlink
0 likes · 10 min read
Performance Comparison of Apache Storm and Apache Flink from Data Transmission and Reliability Perspectives
DataFunTalk
DataFunTalk
Jun 3, 2019 · Big Data

Choosing a Real-Time Computing Engine Based on Kafka: Spark vs Flink

This article examines the need for real‑time computation, explains streaming versus real‑time concepts, and compares Apache Spark and Apache Flink—covering their architectures, micro‑batch and continuous processing, advantages, limitations, windowing, event‑time handling, and watermarks—to guide engine selection for Kafka‑driven workloads.

FlinkKafkaSpark
0 likes · 15 min read
Choosing a Real-Time Computing Engine Based on Kafka: Spark vs Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 2, 2019 · Big Data

Tencent's Oceanus Real-Time Stream Computing Platform and Flink Optimizations

The article presents Tencent's evolution of real‑time stream processing using Flink, the design of the Oceanus one‑stop visual platform, and a series of deep extensions and optimizations—including UI redesign, JobManager failover, checkpoint handling, enhanced windows, LocalKeyBy, idle detection, and log isolation—aimed at supporting petabyte‑scale data workloads.

Big DataFlinkOceanus
0 likes · 16 min read
Tencent's Oceanus Real-Time Stream Computing Platform and Flink Optimizations
Big Data Technology & Architecture
Big Data Technology & Architecture
May 29, 2019 · Cloud Native

Real-Time Computing Solutions with Flink and HBase: Architecture, Market Analysis, and Use Cases

The article presents Alibaba Cloud's real-time computing solution based on Flink and HBase, covering market competition, open‑source ecosystem, containerized architecture on Kubernetes, and typical applications such as online education video analysis, city‑brain traffic management, and fraud detection.

Big DataCloud NativeFlink
0 likes · 12 min read
Real-Time Computing Solutions with Flink and HBase: Architecture, Market Analysis, and Use Cases
Big Data Technology & Architecture
Big Data Technology & Architecture
May 28, 2019 · Big Data

Optimizing Flink Shuffle: New Flow‑Control Mechanism, Serialization Improvements, and Architecture Refactoring

The article explains how Flink's shuffle pipeline—from upstream data serialization to downstream consumption—is optimized through a credit‑based flow‑control mechanism, zero‑copy network buffers, broadcast serialization reduction, external shuffle service, and a plugin‑based shuffle manager, resulting in significant performance gains for both streaming and batch jobs.

Big DataFlinkFlow Control
0 likes · 15 min read
Optimizing Flink Shuffle: New Flow‑Control Mechanism, Serialization Improvements, and Architecture Refactoring
Alibaba Cloud Developer
Alibaba Cloud Developer
May 23, 2019 · Big Data

How Blink Powers Alibaba’s Real‑Time Supply‑Chain Data Warehouse

This article explains how Alibaba's Blink engine tackles the complex challenges of building a real‑time supply‑chain data warehouse—covering retroduction, dimension‑table joins, data skew, timeout statistics, zero‑point optimizations, and future directions—through SQL‑based stream processing and intelligent resource tuning.

Data SkewDimension joinFlink
0 likes · 14 min read
How Blink Powers Alibaba’s Real‑Time Supply‑Chain Data Warehouse
Youzan Coder
Youzan Coder
Apr 29, 2019 · Big Data

Optimizing Flink Sliding Windows for Super Long Time Ranges

To overcome severe performance degradation of Flink sliding windows over very long time ranges, Youzan engineers applied time‑slicing based on the greatest common divisor of window length and slide step, reducing state writes and timers, which yielded 3‑8× speedups in production.

Big DataFlinkReal-time Processing
0 likes · 18 min read
Optimizing Flink Sliding Windows for Super Long Time Ranges
JD Retail Technology
JD Retail Technology
Apr 18, 2019 · Big Data

Data Heterogeneity with BinLake, Binlog, and Flink: Approaches for Order, Subscription, and Product Data

The article explains how data heterogeneity is achieved using JD's BinLake to capture MySQL binlogs, with Flink handling sequential and parallel consumption for order, subscription, and product data, discussing challenges such as ordering guarantees, idempotency, IO overhead, and the shift toward stream‑processing architectures.

BinlogElasticsearchFlink
0 likes · 5 min read
Data Heterogeneity with BinLake, Binlog, and Flink: Approaches for Order, Subscription, and Product Data
ITPUB
ITPUB
Apr 4, 2019 · Big Data

Achieving Sub‑Second Real‑Time Product Selection with Xianyu’s Mach and Blink

Xianyu’s Mach system tackles the e‑commerce challenge of instantly selecting high‑quality items from billions of products by leveraging Blink’s low‑latency stream computing, detailing its architecture—including state, windows, custom UDX functions, data merging, rule execution, and SQL‑to‑MVEL conversion—to achieve sub‑second processing at massive scale.

FlinkReal-Timeblink
0 likes · 18 min read
Achieving Sub‑Second Real‑Time Product Selection with Xianyu’s Mach and Blink
dbaplus Community
dbaplus Community
Mar 21, 2019 · Big Data

How Real-Time Data Platforms Evolve: From Storm to Flink and Kubernetes

This article summarizes Wang Xinchun's 2018 DAMS China Data Asset Management Summit talk, detailing the current state, core services, responsibilities, evolution, architecture, challenges, and future directions of a large‑scale real‑time data platform built on Storm, Spark, Flink, and Kubernetes, including a unified data management approach.

Data PlatformFlinkKubernetes
0 likes · 22 min read
How Real-Time Data Platforms Evolve: From Storm to Flink and Kubernetes
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 21, 2019 · Big Data

Apache Flink Table API Tutorial and End‑to‑End Examples

This article provides a comprehensive tutorial on Apache Flink's Table API, explaining its concepts, core features, and a wide range of operators such as SELECT, WHERE, GROUP BY, UNION, JOIN, and various window functions, while offering complete Scala code examples, custom sources, sinks, and an end‑to‑end job that computes page‑view counts per region using event‑time tumbling windows.

Big DataFlinkScala
0 likes · 36 min read
Apache Flink Table API Tutorial and End‑to‑End Examples
Youzan Coder
Youzan Coder
Mar 20, 2019 · Big Data

Evolution of Real-Time Computing at Youzan: From Storm to Flink and Future Directions

Youzan’s real‑time computing platform progressed from early Storm deployments through Spark Streaming to a Flink‑based architecture, adding unified task management, monitoring, and dedicated streaming clusters, while now pursuing SQL‑driven jobs, a Druid OLAP engine, and a future real‑time data warehouse.

Big DataFlinkSpark Streaming
0 likes · 14 min read
Evolution of Real-Time Computing at Youzan: From Storm to Flink and Future Directions
DataFunTalk
DataFunTalk
Mar 7, 2019 · Big Data

Design and Evolution of Didi's Real‑Time Data Computing Platform

The article details how Didi built and iterated its real‑time data platform, describing the shift from MySQL‑based batch processing to a Kafka‑Samza‑Druid architecture with Spark Streaming and Flink, the challenges addressed, and the current capabilities and operational metrics.

Big DataDruidFlink
0 likes · 9 min read
Design and Evolution of Didi's Real‑Time Data Computing Platform
dbaplus Community
dbaplus Community
Feb 28, 2019 · Big Data

How Zhihu Built a Real-Time Data Warehouse: From Spark Streaming to Flink

This article details Zhihu's evolution of its real-time data warehouse, covering the 1.0 version built on Spark Streaming, the 2.0 upgrade using Flink Streaming SQL, architectural layers, ETL processes, and future directions such as streaming SQL platformization and automated result validation.

ETLFlinkLambda architecture
0 likes · 19 min read
How Zhihu Built a Real-Time Data Warehouse: From Spark Streaming to Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 26, 2019 · Big Data

Deploying Apache Flink Clusters: Standalone and YARN Modes

This guide explains how to set up an Apache Flink cluster on CentOS 7 using three deployment methods—Local, Standalone, and Flink on YARN/Kubernetes—including host configuration, SSH setup, package distribution, configuration file editing, cluster start/stop commands, YARN resource manager concepts, session commands, job submission, fault‑tolerance settings, and log inspection.

Big DataCluster DeploymentConfiguration
0 likes · 11 min read
Deploying Apache Flink Clusters: Standalone and YARN Modes