Tagged articles
946 articles
Page 8 of 10
DataFunTalk
DataFunTalk
Oct 9, 2020 · Big Data

NetEase’s Data Lake Iceberg: Challenges, Core Principles, and Practical Implementation

This article examines the pain points of traditional data warehouse platforms, explains the core concepts and advantages of the Iceberg data lake table format, compares it with Metastore, reviews the current Iceberg community ecosystem, and details NetEase’s practical integration with Hive, Impala, and Flink to improve ETL efficiency and support unified batch‑stream processing.

Data LakeETLFlink
0 likes · 13 min read
NetEase’s Data Lake Iceberg: Challenges, Core Principles, and Practical Implementation
DataFunTalk
DataFunTalk
Oct 2, 2020 · Big Data

Single-Task Recovery in Flink: Design and Implementation for Real‑Time Stream Processing

This article describes ByteDance's single‑task recovery solution for Flink's real‑time computation, detailing the problem of global job restarts, the proposed network‑layer enhancements, upstream and downstream optimizations, JobManager restart strategy, implementation challenges, and the measurable latency and availability benefits achieved in production.

FlinkSingle-Task Recoveryfault tolerance
0 likes · 11 min read
Single-Task Recovery in Flink: Design and Implementation for Real‑Time Stream Processing
DataFunTalk
DataFunTalk
Sep 30, 2020 · Big Data

Real-time Data Warehouse Construction for Didi Ride-hailing's Carpool Service

This article details Didi's end‑to‑end real‑time data warehouse design for the carpool business, covering its objectives, architecture layers from ODS to application, naming conventions, StreamSQL development, operational tooling, challenges faced, and future batch‑stream integration plans.

Big DataDidiFlink
0 likes · 20 min read
Real-time Data Warehouse Construction for Didi Ride-hailing's Carpool Service
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 19, 2020 · Big Data

Understanding Flink Timer Mechanism and Its Internal Implementation

This article explains how Flink's Timer mechanism works, covering its usage in KeyedProcessFunction, the underlying TimerService and InternalTimerService implementations, the role of triggers, and the detailed code paths for processing‑time and event‑time timers, while highlighting performance considerations.

FlinkInternalTimerServiceKeyedProcessFunction
0 likes · 16 min read
Understanding Flink Timer Mechanism and Its Internal Implementation
DataFunTalk
DataFunTalk
Sep 17, 2020 · Big Data

Design and Implementation of a Scalable User Tag Production Platform

The article explains how a flexible, high‑performance user‑tagging system is built on a batch‑stream integrated architecture using big‑data technologies such as Impala, HDFS, and Flink to support both offline and real‑time label generation for precise marketing, product improvement, and operational analytics.

Big DataFlinkImpala
0 likes · 15 min read
Design and Implementation of a Scalable User Tag Production Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 16, 2020 · Big Data

Understanding Flink CEP's NFAb Automaton for Complex Event Processing

This article explains how Flink's Complex Event Processing (CEP) library implements pattern matching using a nondeterministic finite automaton with matching caches (NFAb), covering its theoretical foundation, construction, state transition semantics, event selection strategies, shared versioned match buffers, and computation state details.

Big DataCEPFlink
0 likes · 9 min read
Understanding Flink CEP's NFAb Automaton for Complex Event Processing
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 15, 2020 · Big Data

Designing Nexmark: A Standard Benchmark for Stream Processing Performance

This article examines the challenges of existing stream‑processing benchmarks, introduces the open‑source Nexmark framework designed for reproducible, comprehensive performance testing, describes its metrics, query set, workload configurability, and presents experimental results on Flink, highlighting its role in advancing big‑data stream benchmarking.

CPUFlinkLatency
0 likes · 14 min read
Designing Nexmark: A Standard Benchmark for Stream Processing Performance
ITPUB
ITPUB
Sep 14, 2020 · Big Data

How Alibaba’s DChain Data Converger Auto‑Generates Real‑Time Wide Tables with SQL Pipelines

This article explains how the ADC (Alibaba DChain Data Converger) project automatically creates large real‑time tables by letting users configure metrics on the front‑end, then generating and publishing SQL through a pipeline that leverages design patterns, priority queues, and tree‑based data structures for efficient cross‑database processing.

Design PatternsFlinkReal-time analytics
0 likes · 15 min read
How Alibaba’s DChain Data Converger Auto‑Generates Real‑Time Wide Tables with SQL Pipelines
DataFunTalk
DataFunTalk
Sep 13, 2020 · Big Data

Online Sample Generation with Flink: Architecture and Implementation

This article explains why Flink is chosen for online sample generation, describes the end‑to‑end implementation steps—including stream union, state‑timer processing, and output formatting—covers state backend choices, monitoring, validation, fault handling, and platformization for scalable real‑time machine‑learning pipelines.

FlinkKafkaOnline Sample Generation
0 likes · 11 min read
Online Sample Generation with Flink: Architecture and Implementation
DataFunTalk
DataFunTalk
Sep 10, 2020 · Databases

Graph‑Based Real‑Time Content Update Architecture at Youku: Challenges, Design, and Practice

This technical presentation explains how Youku tackles the massive, real‑time update problem of video‑content graphs by adopting a graph‑database architecture, sub‑graph partitioning, schema‑driven logical views, and Flink‑based pipelines to achieve second‑level updates for billions of entities and attributes.

Big DataFlinkGraph Database
0 likes · 15 min read
Graph‑Based Real‑Time Content Update Architecture at Youku: Challenges, Design, and Practice
DataFunTalk
DataFunTalk
Sep 7, 2020 · Big Data

Real‑time Data Warehouse Architecture and Best Practices in Alibaba Search Recommendation

This article presents Alibaba's search‑recommendation real‑time data warehouse, describing its business background, typical use cases, key requirements, the evolution from architecture 1.0 to 2.0 with Flink and Hologres, best‑practice patterns such as row/column storage, stream‑batch integration, high‑concurrency updates, and future directions like real‑time joins and persistent dimension storage.

Big DataFlinkHologres
0 likes · 13 min read
Real‑time Data Warehouse Architecture and Best Practices in Alibaba Search Recommendation
DataFunTalk
DataFunTalk
Sep 6, 2020 · Big Data

OPPO's Real-Time Data Warehouse Architecture and Practices Based on Apache Flink

OPPO's data platform engineer Zhang Jun shares the design and implementation of OPPO's real‑time data warehouse built on Apache Flink, covering background, top‑level architecture, practical deployment, and future directions such as enhanced SQL development, resource scheduling, and automated configuration.

Data PlatformFlinkStreaming
0 likes · 15 min read
OPPO's Real-Time Data Warehouse Architecture and Practices Based on Apache Flink
DataFunTalk
DataFunTalk
Sep 1, 2020 · Big Data

NetEase Real-Time Computing Platform (Sloth): Architecture, Practices, and Future Outlook

This article introduces NetEase's real-time computing platform Sloth, detailing its architecture, component layers, integrated IDE, operational tooling, unified metadata management, challenges such as Kudu write amplification, and proposes a tiered real‑time data‑warehouse model with a vision for storage‑compute separation and unified batch‑stream APIs.

Big DataFlinkKafka
0 likes · 13 min read
NetEase Real-Time Computing Platform (Sloth): Architecture, Practices, and Future Outlook
Didi Tech
Didi Tech
Aug 26, 2020 · Big Data

Real-time Data Warehouse Construction at Didi: Architecture, Practices, and Lessons

To support Didi’s fast‑growing car‑pool service, a real‑time data warehouse was built using a streamlined layered architecture—ODS, DWD, DIM, DWM, and APP—leveraging Flink‑based StreamSQL, Kafka, Druid and ClickHouse to deliver minute‑level analytics, dashboards, monitoring, and cross‑business interfaces while planning unified meta‑store integration.

Big Data ArchitectureData PlatformFlink
0 likes · 20 min read
Real-time Data Warehouse Construction at Didi: Architecture, Practices, and Lessons
Youzan Coder
Youzan Coder
Aug 26, 2020 · Mobile Development

How We Built a Real‑Time Crash Feedback Platform for Mobile Apps

This article details the design and implementation of a comprehensive crash feedback platform for mobile applications, covering the motivation behind replacing third‑party services, the system architecture using Flink, Kafka and HBase, crash interception on Android, automated grouping and assignment, version filtering, daily reporting, and future enhancements.

AndroidFlinkKafka
0 likes · 15 min read
How We Built a Real‑Time Crash Feedback Platform for Mobile Apps
Didi Tech
Didi Tech
Aug 24, 2020 · Big Data

Evolution and Architecture of DiDi Data Channel Service

DiDi’s Data Channel Service evolved from a fragmented component system into a unified, SLA‑driven platform with a UI‑based Sync Center and Flink‑powered StreamSQL engine, dramatically improving task creation speed, resource utilization, and reliability while automating issue diagnosis for company‑wide real‑time and offline data synchronization.

Big DataETLFlink
0 likes · 12 min read
Evolution and Architecture of DiDi Data Channel Service
Top Architect
Top Architect
Aug 14, 2020 · Big Data

Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions

This article presents a comprehensive guide for transferring massive MySQL datasets to HBase, covering environment setup on Ubuntu, three synchronization methods—MySQL LOAD DATA, a Kafka‑Thrift pipeline using Maxwell, and real‑time Flink processing—along with performance comparisons and practical tips for Hadoop, HBase, Kafka, Zookeeper, Phoenix, and related tools.

DataSyncFlinkHBase
0 likes · 24 min read
Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions
DataFunTalk
DataFunTalk
Aug 10, 2020 · Big Data

Understanding Flink SQL Architecture, Optimizations, and Internal Mechanisms

This article explains the evolution of Apache Flink's SQL support, detailing the Blink Planner architecture, the end‑to‑end Flink SQL workflow, logical and physical planning, code generation, stream‑specific optimizations such as retraction and mini‑batch, and future development directions.

Blink PlannerFlinkoptimization
0 likes · 20 min read
Understanding Flink SQL Architecture, Optimizations, and Internal Mechanisms
DataFunTalk
DataFunTalk
Aug 4, 2020 · Artificial Intelligence

Weibo Machine Learning Platform (WML) Overview and Flink Applications

This article presents an in‑depth overview of Weibo's large‑scale machine learning platform, detailing its multi‑layer architecture, development workflow, CTR model evolution, and how Apache Flink is employed for real‑time data processing, sample services, multi‑stream joins, multimedia feature generation, and future roadmap plans.

CTRData PlatformFlink
0 likes · 12 min read
Weibo Machine Learning Platform (WML) Overview and Flink Applications
ITPUB
ITPUB
Jul 23, 2020 · Artificial Intelligence

How Likee Scales Short‑Video Recommendations with Flink, Auto‑Stats, and Cache Tensor

This article details Likee's short‑video recommendation pipeline, covering the evolution of its feature‑engineering framework, the use of Flink for minute‑level statistical and second‑level session features, the integration of automatic statistical features into DNN models, multimodal feature extraction, and the cache‑tensor technique that dramatically improves online inference performance.

AIDeep LearningFlink
0 likes · 18 min read
How Likee Scales Short‑Video Recommendations with Flink, Auto‑Stats, and Cache Tensor
DataFunTalk
DataFunTalk
Jul 22, 2020 · Big Data

Building a Real-Time Computing Platform with Apache Flink at iQIYI: Architecture, Improvements, and Business Cases

iQIYI’s senior data engineer shares the evolution of its big‑data services from Hadoop to a Flink‑based real‑time computing platform, detailing architecture, monitoring improvements, StreamingSQL capabilities, business use cases like recommendation and deep‑learning data generation, and future plans for unified stream‑batch processing.

Apache FlinkData PlatformFlink
0 likes · 11 min read
Building a Real-Time Computing Platform with Apache Flink at iQIYI: Architecture, Improvements, and Business Cases
Programmer DD
Programmer DD
Jul 22, 2020 · Big Data

How to Sync Billions of MySQL Records to HBase: 3 Powerful Methods Using Hadoop, Kafka, and Flink

This comprehensive guide walks you through setting up a pseudo‑distributed Hadoop environment, loading massive MySQL data with LOAD DATA, Python scripts, and multithreading, and then synchronizing the data to HBase using three approaches—Sqoop, a Kafka‑Thrift pipeline, and a real‑time Kafka‑Flink pipeline—while also comparing query performance of HBase and Phoenix.

FlinkHBaseKafka
0 likes · 28 min read
How to Sync Billions of MySQL Records to HBase: 3 Powerful Methods Using Hadoop, Kafka, and Flink
Architect
Architect
Jul 15, 2020 · Big Data

Understanding Flink Task Slots, Resource Allocation, and Slot Sharing Mechanisms

This article explains how Flink uses task slots to partition TaskManager resources, the benefits of slot sharing, the interaction between Scheduler, SlotPool, and ResourceManager, and the internal classes such as LogicalSlot, PhysicalSlot, and SlotSharingManager that enable resource isolation and sharing in stream processing jobs.

Big DataFlinkResource Management
0 likes · 6 min read
Understanding Flink Task Slots, Resource Allocation, and Slot Sharing Mechanisms
DataFunTalk
DataFunTalk
Jul 10, 2020 · Big Data

Apache Flink Practice at NetEase: Architecture, Scale, and Future Directions

This article details NetEase's evolution from Storm to Flink for real‑time computing, describing the Sloth platform's architecture, large‑scale deployment, diverse business scenarios, monitoring, alerting, and future development plans, illustrating how Flink powers data synchronization, real‑time warehousing, and e‑commerce analytics and recommendation.

FlinkNetEaseReal-time analytics
0 likes · 15 min read
Apache Flink Practice at NetEase: Architecture, Scale, and Future Directions
Big Data Technology Architecture
Big Data Technology Architecture
Jul 8, 2020 · Big Data

Apache Flink 1.11.0 Release: New Features and Optimizations

Apache Flink 1.11.0 introduces a suite of major enhancements—including unaligned checkpoints, a unified source interface, CDC support in Table API/SQL, performance‑boosted PyFlink, a new application deployment mode, and numerous UI, Docker, and catalog improvements—aimed at increasing usability, scalability, and integration across streaming and batch workloads.

FlinkSource Interfacecheckpointing
0 likes · 18 min read
Apache Flink 1.11.0 Release: New Features and Optimizations
dbaplus Community
dbaplus Community
Jul 7, 2020 · Big Data

How Flink + ClickHouse Power Real‑Time Analytics at Scale

This article explains how FunTouTiao builds a high‑performance real‑time analytics pipeline using Flink, Hive, and ClickHouse, covering business scenarios, hour‑level and second‑level Flink‑to‑Hive architectures, streaming file sink mechanics, multi‑user permissions, ClickHouse performance tricks, and future roadmap for unified stream‑batch storage.

Big DataFlinkReal-Time
0 likes · 18 min read
How Flink + ClickHouse Power Real‑Time Analytics at Scale
Programmer DD
Programmer DD
Jul 7, 2020 · Big Data

How to Choose a Worthwhile Technology: Depth, Ecosystem, and Evolution

The article outlines a three‑dimensional framework—technical depth, ecosystem breadth, and evolution capability—to help engineers decide which big‑data or stream‑processing technology (such as Hadoop, Spark, or Flink) is worth investing time in, and provides practical tips like using Google Trends and GitHub awesome lists.

Big DataFlinkHadoop
0 likes · 12 min read
How to Choose a Worthwhile Technology: Depth, Ecosystem, and Evolution
Architect
Architect
Jul 4, 2020 · Big Data

Kuaishou Flink Real‑Time Architecture and Spring Festival Gala Assurance Practices

This article details Kuaishou's Flink‑based real‑time computing architecture, its massive cluster scale, and the comprehensive strategies—including overload protection, system stability, pressure testing, and resource guarantees—implemented to ensure reliable streaming for the 2020 Spring Festival Gala and its real‑time dashboard.

Big DataFlinkKuaishou
0 likes · 12 min read
Kuaishou Flink Real‑Time Architecture and Spring Festival Gala Assurance Practices
DataFunTalk
DataFunTalk
Jun 30, 2020 · Big Data

Flink Real‑Time Data Warehouse Practices at Shopee Singapore Data Team

This article details Shopee Singapore Data Team’s implementation of a Flink‑based real‑time data warehouse, covering background challenges, layered architecture integrating Kafka, HBase, Druid, Hive, streaming pipelines, job management, monitoring, and future plans to expand Flink SQL support.

FlinkReal-TimeShopee
0 likes · 15 min read
Flink Real‑Time Data Warehouse Practices at Shopee Singapore Data Team
Big Data Technology Architecture
Big Data Technology Architecture
Jun 29, 2020 · Big Data

Real‑time Data Warehouse Construction: Goals, Architecture, and Best Practices with Apache Flink

This article summarizes the objectives, design principles, application scenarios, layer‑by‑layer construction methods, quality assurance mechanisms, and supporting tools for building a real‑time data warehouse using Apache Flink, providing practical guidance for data engineers and architects.

Apache FlinkData QualityFlink
0 likes · 24 min read
Real‑time Data Warehouse Construction: Goals, Architecture, and Best Practices with Apache Flink
DataFunTalk
DataFunTalk
Jun 18, 2020 · Big Data

Real-time Data Processing at QuTouTiao: Flink + ClickHouse Architecture and Practices

QuTouTiao leverages Flink and ClickHouse to build a high‑performance real‑time analytics platform that supports hourly Hive pipelines and sub‑second ClickHouse queries, achieving sub‑second response for 80% of requests through streaming ingestion, exactly‑once semantics, multi‑cluster coordination, and optimized ClickHouse storage and connector designs.

Big DataFlinkReal-time analytics
0 likes · 16 min read
Real-time Data Processing at QuTouTiao: Flink + ClickHouse Architecture and Practices
Big Data Technology Architecture
Big Data Technology Architecture
Jun 18, 2020 · Big Data

Understanding Data Lakes, Data Warehouses, and Real-Time Analytics with Hologres

This article analyzes the challenges of traditional data lake and warehouse architectures, explains why unified storage and compute are needed for real‑time and batch workloads, and introduces Hologres as a cloud‑native, high‑performance engine that combines PostgreSQL compatibility with Flink‑driven analytics to deliver a true real‑time data warehouse solution.

FlinkHologresReal-time analytics
0 likes · 13 min read
Understanding Data Lakes, Data Warehouses, and Real-Time Analytics with Hologres
Big Data Technology Architecture
Big Data Technology Architecture
Jun 16, 2020 · Big Data

Real-time Multi-dimensional Analytics and SlimBase State Backend at Kuaishou: Flink Applications and Optimizations

This article describes how Kuaishou leverages Apache Flink for large‑scale real‑time multi‑dimensional analytics, details the architecture of its analytics platform using Kudu storage and KwaiBI, and introduces SlimBase—a lightweight, embedded shared state backend that replaces RocksDB to reduce I/O, latency, and CPU overhead.

FlinkKuaishouKudu
0 likes · 17 min read
Real-time Multi-dimensional Analytics and SlimBase State Backend at Kuaishou: Flink Applications and Optimizations
Beike Product & Technology
Beike Product & Technology
Jun 12, 2020 · Big Data

Design and Implementation of SQL on Streaming (SQL 1.0 → SQL 2.0) in a Real‑Time Computing Platform

This article describes the evolution of a real‑time computing platform from SQL 1.0 built on Spark Structured Streaming to SQL 2.0 powered by Flink‑SQL, covering dynamic tables, continuous queries, dimension‑table joins, cache optimization, DDL extensions, platformization, operational challenges and future roadmap.

Big DataDimension TableFlink
0 likes · 19 min read
Design and Implementation of SQL on Streaming (SQL 1.0 → SQL 2.0) in a Real‑Time Computing Platform
DataFunTalk
DataFunTalk
Jun 11, 2020 · Big Data

Real-time Multi-dimensional Analytics and SlimBase State Backend at Kuaishou: Flink Applications and Optimizations

This article presents Kuaishou's extensive use of Apache Flink for real-time multi-dimensional analytics, detailing the platform's architecture, cluster scale, data processing pipelines, the design of a shared state storage engine called SlimBase, and performance improvements achieved through replacing RocksDB with a customized HBase‑based solution.

Big DataFlinkKuaishou
0 likes · 15 min read
Real-time Multi-dimensional Analytics and SlimBase State Backend at Kuaishou: Flink Applications and Optimizations
Architect
Architect
Jun 10, 2020 · Big Data

Understanding Flink Time Notions: ProcessTime, EventTime, IngestionTime and Watermarks with Code Examples

This article explains the three time notions supported by Apache Flink—ProcessTime, EventTime, and IngestionTime—detailing their semantics, how Watermarks enable event‑time processing, and provides Scala code samples for configuring time characteristics, assigning timestamps, and generating Watermarks in a streaming job.

EventTimeFlinkScala
0 likes · 16 min read
Understanding Flink Time Notions: ProcessTime, EventTime, IngestionTime and Watermarks with Code Examples
58 Tech
58 Tech
Jun 10, 2020 · Big Data

Real‑time Data Warehouse Practices at 58 Tongcheng Bao: From Spark Streaming 1.0 to Flink‑based 2.0

This article details the evolution of 58 Tongcheng Bao's real‑time data warehouse, describing the initial Spark‑Streaming architecture, its limitations, and the redesign using Flink with a layered ODS‑DWD‑DWS‑APP model, data‑quality monitoring, join techniques, and the resulting improvements in latency and accuracy.

Big DataData QualityFlink
0 likes · 9 min read
Real‑time Data Warehouse Practices at 58 Tongcheng Bao: From Spark Streaming 1.0 to Flink‑based 2.0
dbaplus Community
dbaplus Community
Jun 2, 2020 · Big Data

How Cainiao Built a Scalable Real‑Time Data Warehouse with Flink

Facing growing order volumes and strict timeliness demands, Cainiao’s tech team overhauled its real‑time data warehouse by redesigning data models, adopting Flink for streaming computation, upgrading data services, and exploring innovative tools, sharing practical lessons and future directions for large‑scale logistics analytics.

Big DataFlinkLogistics
0 likes · 18 min read
How Cainiao Built a Scalable Real‑Time Data Warehouse with Flink
Architect
Architect
May 30, 2020 · Big Data

Understanding Flink’s Unified Programming API for Batch and Streaming Jobs

This article examines Apache Flink’s programming model, comparing its batch DataSet API with the streaming DataStream API, detailing class hierarchies, key code examples such as groupBy and job submission, and explaining how both paradigms are unified into a common JobGraph representation.

Batch ProcessingBig DataFlink
0 likes · 9 min read
Understanding Flink’s Unified Programming API for Batch and Streaming Jobs
Architect
Architect
May 29, 2020 · Artificial Intelligence

Integrating Flink with TensorFlow for End-to-End Machine Learning Pipelines

This article explains how to combine the Flink data‑processing engine with TensorFlow to create a unified, end‑to‑end machine‑learning workflow, covering background, challenges, the Flink‑AI‑extended architecture, ML framework and operator abstractions, and both batch and streaming training and prediction modes.

AI integrationDistributed TrainingFlink
0 likes · 9 min read
Integrating Flink with TensorFlow for End-to-End Machine Learning Pipelines
Huolala Tech
Huolala Tech
May 28, 2020 · Big Data

How Flink Powers Real‑Time Risk Control at HuoLaLa: Architecture and Insights

This article explains Flink's role in HuoLaLa's risk‑control system, covering its background, the Lambda‑style architecture that combines batch and streaming, the real‑time data pipeline, machine‑learning models, and operational safeguards that together enable proactive fraud detection.

Big Data ArchitectureFlinkLambda architecture
0 likes · 16 min read
How Flink Powers Real‑Time Risk Control at HuoLaLa: Architecture and Insights
DataFunTalk
DataFunTalk
May 14, 2020 · Big Data

Building a Real-Time Data Warehouse at Cainiao: Architecture, Model Upgrades, Engine Enhancements, and Service Innovations

This article shares Cainiao's practical experience in constructing a real-time data warehouse, covering the shortcomings of the previous architecture, the evolution of data models, the migration to Flink with advanced features like retraction and timer services, and the modernization of data services and tooling to support high‑throughput logistics scenarios.

Big DataData ServiceFlink
0 likes · 16 min read
Building a Real-Time Data Warehouse at Cainiao: Architecture, Model Upgrades, Engine Enhancements, and Service Innovations
Big Data Technology Architecture
Big Data Technology Architecture
Apr 15, 2020 · Big Data

Real-Time Data Warehouse Practices: Case Studies from Meituan, NetEase, Zhihu, and OPPO

This article reviews the evolution of data warehouses from traditional offline models to modern real‑time architectures, presenting detailed case studies of Meituan, NetEase, Zhihu, and OPPO, and discusses layer designs, technology choices such as Flink, Kafka, and storage options, and key lessons for building scalable real‑time warehouses.

Big DataFlinkKafka
0 likes · 13 min read
Real-Time Data Warehouse Practices: Case Studies from Meituan, NetEase, Zhihu, and OPPO
Dada Group Technology
Dada Group Technology
Apr 15, 2020 · Big Data

Practice Experience of Dada Group's Real-Time Computation SQLization Using Dada Flink SQL

This article details Dada Group's development of the Dada Flink SQL engine, describing its background, architecture, parser design, dimension‑table join strategies, numerous enhancements such as HA support, Kafka keyword handling, metadata integration, Redis and ClickHouse sinks, BINLOG simplification, and future migration plans toward Flink 1.10.

FlinkReal‑Time ComputingSQL Engine
0 likes · 12 min read
Practice Experience of Dada Group's Real-Time Computation SQLization Using Dada Flink SQL
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 8, 2020 · Big Data

Common Apache Flink Exceptions and How to Resolve Them

This article enumerates typical Apache Flink deployment, job, and checkpoint errors—such as JDK version issues, resource shortages, task manager timeouts, and state migration problems—and provides practical troubleshooting steps and configuration tips to help engineers quickly diagnose and fix these failures.

Big DataCheckpointException
0 likes · 8 min read
Common Apache Flink Exceptions and How to Resolve Them
DataFunTalk
DataFunTalk
Mar 28, 2020 · Big Data

Applying Flink State Management for Real-Time Recommendation Scenarios

This article explains how Apache Flink's flexible state management can be leveraged to solve data correlation challenges in real‑time recommendation platforms, compares Flink with Spark and Storm, describes the underlying broadcast and managed state mechanisms, and provides a step‑by‑step implementation using Kafka, Druid, and custom broadcast functions.

Big DataFlinkReal-Time
0 likes · 14 min read
Applying Flink State Management for Real-Time Recommendation Scenarios
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 19, 2020 · Big Data

Can Flink Unify Real‑Time and Offline Data Warehouses? A Deep Dive

This article examines the challenges of maintaining separate offline and real‑time data warehouses, explains the three‑layer ODS‑DW‑ADS model, evaluates the traditional Lambda architecture, and explores how a unified Flink stack with Kafka, HiveCatalog and streaming sinks can simplify metadata, SQL development, data import/export, and stateful processing for both batch and streaming workloads.

FlinkLambda architectureReal-Time
0 likes · 12 min read
Can Flink Unify Real‑Time and Offline Data Warehouses? A Deep Dive
Top Architect
Top Architect
Mar 13, 2020 · Big Data

Three Billion‑Scale MySQL‑to‑HBase Synchronization Solutions and Practical Implementation

This article presents a comprehensive guide for synchronizing massive MySQL datasets to HBase, covering environment preparation, fast MySQL data loading techniques, and three practical pipelines—Sqoop, Kafka‑Thrift, and Kafka‑Flink—along with performance comparisons and optimization tips for large‑scale data processing.

Big DataFlinkHBase
0 likes · 24 min read
Three Billion‑Scale MySQL‑to‑HBase Synchronization Solutions and Practical Implementation
DataFunTalk
DataFunTalk
Mar 8, 2020 · Big Data

Real-Time Log Monitoring and Alerting System for iQIYI Membership Services

This article describes how iQIYI built a real‑time, multi‑dimensional log monitoring platform using Spark Streaming, Flink, Kafka and Druid to handle billions of logs, improve alerting accuracy, reduce incident response time, and outline future intelligent monitoring enhancements.

DruidFlinkLog Analytics
0 likes · 10 min read
Real-Time Log Monitoring and Alerting System for iQIYI Membership Services
iQIYI Technical Product Team
iQIYI Technical Product Team
Mar 6, 2020 · Big Data

Real-Time Log Monitoring and Alerting for iQIYI Membership Services

To support over 100 million iQIYI members, the team rebuilt a real‑time log monitoring platform that gathers access, exception, Nginx and front‑end logs via a Venus‑Agent, streams them through Kafka to Spark Streaming and Flink, stores metrics in Druid, and provides minute‑level host and business alerts, achieving 80 % faster incident investigation, detecting 90 % of member complaints early, and generating more than 4,800 actionable alerts.

Big DataFlinkLog Analytics
0 likes · 11 min read
Real-Time Log Monitoring and Alerting for iQIYI Membership Services
58 Tech
58 Tech
Mar 4, 2020 · Big Data

Applying Flink State Management to Real‑Time Recommendation Scenarios

This article explains how Flink's flexible state management, including Broadcast, Keyed, and Operator states, can be used to solve real‑time recommendation challenges such as per‑minute UV, click, and exposure counting, while addressing locality mapping and data‑delay issues with Druid as the downstream store.

Broadcast StateDruidFlink
0 likes · 13 min read
Applying Flink State Management to Real‑Time Recommendation Scenarios
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 22, 2020 · Big Data

Understanding Flink's Asynchronous Barrier Snapshot (ABS) Algorithm for Checkpointing

This article explains how Apache Flink implements fault‑tolerant checkpointing using the Asynchronous Barrier Snapshot (ABS) algorithm, a localized version of the Chandy‑Lamport distributed snapshot, covering barriers, snapshot alignment, exactly‑once versus at‑least‑once semantics, and handling of cyclic dataflow graphs.

Asynchronous Barrier SnapshotDistributed SystemsFlink
0 likes · 9 min read
Understanding Flink's Asynchronous Barrier Snapshot (ABS) Algorithm for Checkpointing
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 19, 2020 · Artificial Intelligence

How Flink Is Powering Real‑Time AI: From Lambda Architecture to Stream‑Batch Unification

This article examines how Apache Flink embraces AI by leveraging the Lambda architecture and stream‑batch unification to enable real‑time data processing across preprocessing, model training, and inference, discusses the challenges of model updates and code maintenance, and outlines ongoing Flink initiatives that support AI real‑timeization.

AIFlink
0 likes · 15 min read
How Flink Is Powering Real‑Time AI: From Lambda Architecture to Stream‑Batch Unification
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 15, 2020 · Big Data

Understanding Event Time and Watermarks in Apache Flink

This article explains how Apache Flink uses event‑time timestamps and watermarks to handle out‑of‑order and late data, describes the assignTimestampsAndWatermarks API with periodic and punctuated watermark assigners, and provides practical code examples for window lateness and side‑output handling.

Apache FlinkEvent TimeFlink
0 likes · 10 min read
Understanding Event Time and Watermarks in Apache Flink
Big Data Technology Architecture
Big Data Technology Architecture
Feb 13, 2020 · Big Data

Evolution of Cainiao's Real-Time Data Warehouse Architecture: Model, Compute Engine, and Data Service Upgrades

The talk details Cainiao’s evolution of its real‑time data warehouse architecture, covering the original 2016 model, compute and service challenges, the 2017 multi‑layer data model redesign, migration to Flink, practical cases of state retraction, timeout statistics, smart optimizations, and the unified data service platform.

Data ServiceFlinkStreaming
0 likes · 16 min read
Evolution of Cainiao's Real-Time Data Warehouse Architecture: Model, Compute Engine, and Data Service Upgrades
Xianyu Technology
Xianyu Technology
Feb 11, 2020 · Big Data

Client-side Complex Event Processing with Flink CEP and Python

The article describes how Xianyu’s recommendation system shifts complex event processing from server‑side Blink to client‑side Python using Flink CEP concepts, detailing the NFA‑based state and transition model, pattern‑building API, aggregation support, achieving sub‑second execution with modest memory, and outlines future optimizations such as NFA persistence, windowing, DSL script generation, and C++/TensorFlow Lite acceleration.

CEPClientSideFlink
0 likes · 13 min read
Client-side Complex Event Processing with Flink CEP and Python
DataFunTalk
DataFunTalk
Feb 10, 2020 · Artificial Intelligence

Real‑Time Intelligent Anomaly Detection Platform at Ctrip: Integrating Flink and TensorFlow (Prophet)

The article describes Ctrip's Prophet platform, which combines Flink real‑time stream processing with TensorFlow deep‑learning models to provide intelligent, low‑latency anomaly detection, replacing traditional rule‑based alerts and addressing challenges such as holiday traffic and model scalability.

AIDeep LearningFlink
0 likes · 13 min read
Real‑Time Intelligent Anomaly Detection Platform at Ctrip: Integrating Flink and TensorFlow (Prophet)
Big Data Technology Architecture
Big Data Technology Architecture
Feb 8, 2020 · Big Data

Meituan-Dianping Real-Time Data Warehouse Platform Built on Apache Flink: Architecture, Practices, and Future Directions

Meituan-Dianping’s senior technical expert shares the evolution, architecture, and implementation of their Apache Flink‑based real‑time data warehouse platform, covering platform evolution, layered design, job and resource management, business warehouse use cases, and future development considerations.

FlinkMeituan-DianpingStreaming
0 likes · 16 min read
Meituan-Dianping Real-Time Data Warehouse Platform Built on Apache Flink: Architecture, Practices, and Future Directions
DataFunTalk
DataFunTalk
Jan 22, 2020 · Big Data

Real-Time Data Engineering Practices for Alibaba 1688 Business

This article explains how Alibaba 1688 achieves real‑time recommendation, advertising, and product statistics through a robust middle‑platform foundation, streaming engines like Blink, data synchronization tools, and scalable storage, illustrating three concrete engineering cases and the end‑to‑end real‑time data service pipeline.

AlibabaFlinkstream processing
0 likes · 8 min read
Real-Time Data Engineering Practices for Alibaba 1688 Business
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 20, 2020 · Big Data

Alibaba’s Secrets to High‑Throughput Full‑Load and Low‑Latency Search Processing

This article details how Alibaba migrated its massive Taobao‑Tmall search workload to the search offline platform, tackling challenges of massive data volume, one‑to‑many joins, and hotspot sellers through a series of performance optimizations—including local joins, salt‑based data sharding, dynamic aggregation jobs, and asynchronous processing—to achieve high‑throughput full loads and low‑latency incremental updates.

AlibabaBig DataFlink
0 likes · 15 min read
Alibaba’s Secrets to High‑Throughput Full‑Load and Low‑Latency Search Processing
dbaplus Community
dbaplus Community
Jan 14, 2020 · Big Data

How OPPO Built a Real‑Time Data Warehouse with Flink SQL

This article details{32-64 words} OPPO's evolution from an offline data warehouse to a real‑time platform, describing the business scale, data‑mid platform architecture, migration strategy using Flink SQL, extensions like AthenaX, and practical use cases such as real‑time ETL, CTR calculation, and tag import.

ETLFlinkStreaming
0 likes · 18 min read
How OPPO Built a Real‑Time Data Warehouse with Flink SQL
DataFunTalk
DataFunTalk
Jan 10, 2020 · Big Data

Design and Evolution of iQIYI's Real-Time Analytics Platform (RAP)

The article details iQIYI's Real-Time Analysis Platform (RAP), describing its motivation, architecture evolution from RAP 1.x to 2.x, OLAP engine selection, product design workflow, integration of Druid KIS and Flink, enhanced diagnostics, and real-world applications in membership monitoring, recommendation evaluation, and smart TV alerting.

DruidFlinkOLAP
0 likes · 12 min read
Design and Evolution of iQIYI's Real-Time Analytics Platform (RAP)
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 10, 2020 · Big Data

Async I/O for Dimension Table Joins in Apache Flink

This article explains how to handle dimension table joins in Apache Flink streaming by leveraging Async I/O to perform non‑blocking external lookups, provides detailed code examples for both synchronous and asynchronous functions, discusses configuration parameters, and outlines best practices and pitfalls.

Big DataDimension Table JoinFlink
0 likes · 16 min read
Async I/O for Dimension Table Joins in Apache Flink
iQIYI Technical Product Team
iQIYI Technical Product Team
Jan 9, 2020 · Big Data

Design and Evolution of iQIYI Real-Time Analysis Platform (RAP)

iQIYI’s Real‑Time Analysis Platform (RAP) combines Apache Druid with Spark/Flink to deliver minute‑level, low‑latency multidimensional analytics via a web wizard, supporting hundreds of streaming tasks and thousands of reports across membership, recommendation, and TV monitoring, while simplifying development and maintenance.

Apache DruidBig DataFlink
0 likes · 13 min read
Design and Evolution of iQIYI Real-Time Analysis Platform (RAP)
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Jan 7, 2020 · Big Data

Design and Implementation of XFlink: A Flink‑Based Data Migration System on Yarn

The article describes the evolution from the legacy XDATA tool to the new XFlink system, detailing its architecture, core plugins, parser and deployment modules, resource management with Yarn, monitoring via Prometheus and Grafana, and planned enhancements such as Flink SQL configuration and modular plugins.

Big DataData MigrationDistributed Systems
0 likes · 10 min read
Design and Implementation of XFlink: A Flink‑Based Data Migration System on Yarn
dbaplus Community
dbaplus Community
Jan 6, 2020 · Big Data

How 58.com Built a Scalable Flink‑Based Real‑Time Data Platform (Wstream)

The article details how 58.com designed and evolved its one‑stop real‑time computation platform Wstream, migrating from Storm and Spark Streaming to Apache Flink, and describes the architecture, task isolation, stream‑SQL features, monitoring, and ongoing optimizations that enable processing of over 600 billion records daily.

Big DataFlinkReal-time Streaming
0 likes · 12 min read
How 58.com Built a Scalable Flink‑Based Real‑Time Data Platform (Wstream)
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 25, 2019 · Big Data

Understanding Flink StreamPartitioner and Its Implementations

Flink’s StreamPartitioner abstracts data routing in DataStream, offering eight built‑in partitioners—including Global, Shuffle, Rebalance, KeyGroup, Broadcast, Rescale, Forward, and Custom—each with distinct channel selection logic, illustrated with source code snippets and explanations of their runtime behavior.

Big DataDataStreamFlink
0 likes · 8 min read
Understanding Flink StreamPartitioner and Its Implementations
Qunar Tech Salon
Qunar Tech Salon
Dec 20, 2019 · Big Data

Understanding Flink Cluster Startup and Job Execution Process

This article explains the architecture of a Flink cluster, detailing the startup procedures for JobManager and TaskManager, the three deployment modes, and the end‑to‑end flow of a Flink job from client code through StreamGraph, JobGraph, ExecutionGraph to the physical execution on TaskManagers.

Big DataCluster ArchitectureFlink
0 likes · 10 min read
Understanding Flink Cluster Startup and Job Execution Process