Tagged articles

946 articles

Page 8 of 10

Oct 12, 2020 · Big Data

Building a General Real‑Time Data Warehouse: Methods and Practices at Meituan Waimai

This article introduces Meituan Waimai's approach to constructing a universal real‑time data warehouse, covering streaming technology choices, Lambda/Kappa architectures, layered design, platformization, SLA management, and a practical Lambda‑style use case for real‑time analytics.

Big DataDoris OLAPFlink

0 likes · 16 min read

Building a General Real‑Time Data Warehouse: Methods and Practices at Meituan Waimai

DataFunTalk

Oct 9, 2020 · Big Data

NetEase’s Data Lake Iceberg: Challenges, Core Principles, and Practical Implementation

This article examines the pain points of traditional data warehouse platforms, explains the core concepts and advantages of the Iceberg data lake table format, compares it with Metastore, reviews the current Iceberg community ecosystem, and details NetEase’s practical integration with Hive, Impala, and Flink to improve ETL efficiency and support unified batch‑stream processing.

Data LakeETLFlink

0 likes · 13 min read

NetEase’s Data Lake Iceberg: Challenges, Core Principles, and Practical Implementation

DataFunTalk

Oct 2, 2020 · Big Data

Single-Task Recovery in Flink: Design and Implementation for Real‑Time Stream Processing

This article describes ByteDance's single‑task recovery solution for Flink's real‑time computation, detailing the problem of global job restarts, the proposed network‑layer enhancements, upstream and downstream optimizations, JobManager restart strategy, implementation challenges, and the measurable latency and availability benefits achieved in production.

FlinkSingle-Task Recoveryfault tolerance

0 likes · 11 min read

Single-Task Recovery in Flink: Design and Implementation for Real‑Time Stream Processing

DataFunTalk

Sep 30, 2020 · Big Data

Real-time Data Warehouse Construction for Didi Ride-hailing's Carpool Service

This article details Didi's end‑to‑end real‑time data warehouse design for the carpool business, covering its objectives, architecture layers from ODS to application, naming conventions, StreamSQL development, operational tooling, challenges faced, and future batch‑stream integration plans.

Big DataDidiFlink

0 likes · 20 min read

Real-time Data Warehouse Construction for Didi Ride-hailing's Carpool Service

Big Data Technology & Architecture

Sep 29, 2020 · Big Data

Implementing Real-Time TopN Rankings with Apache Flink

This article demonstrates how to develop a real-time TopN ranking feature in Apache Flink, covering stream setup, word count aggregation, global and grouped TopN calculations, and nested TopN strategies to mitigate hotspot issues, complete with Java code examples.

Big DataFlinkReal-Time

0 likes · 8 min read

Implementing Real-Time TopN Rankings with Apache Flink

Big Data Technology & Architecture

Sep 19, 2020 · Big Data

Understanding Flink Timer Mechanism and Its Internal Implementation

This article explains how Flink's Timer mechanism works, covering its usage in KeyedProcessFunction, the underlying TimerService and InternalTimerService implementations, the role of triggers, and the detailed code paths for processing‑time and event‑time timers, while highlighting performance considerations.

FlinkInternalTimerServiceKeyedProcessFunction

0 likes · 16 min read

Understanding Flink Timer Mechanism and Its Internal Implementation

DataFunTalk

Sep 17, 2020 · Big Data

Design and Implementation of a Scalable User Tag Production Platform

The article explains how a flexible, high‑performance user‑tagging system is built on a batch‑stream integrated architecture using big‑data technologies such as Impala, HDFS, and Flink to support both offline and real‑time label generation for precise marketing, product improvement, and operational analytics.

Big DataFlinkImpala

0 likes · 15 min read

Design and Implementation of a Scalable User Tag Production Platform

Big Data Technology & Architecture

Sep 16, 2020 · Big Data

Understanding Flink CEP's NFAb Automaton for Complex Event Processing

This article explains how Flink's Complex Event Processing (CEP) library implements pattern matching using a nondeterministic finite automaton with matching caches (NFAb), covering its theoretical foundation, construction, state transition semantics, event selection strategies, shared versioned match buffers, and computation state details.

Big DataCEPFlink

0 likes · 9 min read

Understanding Flink CEP's NFAb Automaton for Complex Event Processing

Alibaba Cloud Developer

Sep 15, 2020 · Big Data

Designing Nexmark: A Standard Benchmark for Stream Processing Performance

This article examines the challenges of existing stream‑processing benchmarks, introduces the open‑source Nexmark framework designed for reproducible, comprehensive performance testing, describes its metrics, query set, workload configurability, and presents experimental results on Flink, highlighting its role in advancing big‑data stream benchmarking.

CPUFlinkLatency

0 likes · 14 min read

Designing Nexmark: A Standard Benchmark for Stream Processing Performance

dbaplus Community

Sep 14, 2020 · Operations

How iQIYI Scaled Real‑Time Log Monitoring for 100M+ Users with Spark, Flink and Druid

Facing a surge to over 100 million members, iQIYI rebuilt its monitoring stack by ingesting four log types, adopting Spark Streaming, Flink and Druid for real‑time analysis, and optimizing resource usage, which cut incident resolution time by more than 80 % while supporting billion‑level data volumes.

DruidFlinkKafka

0 likes · 12 min read

How iQIYI Scaled Real‑Time Log Monitoring for 100M+ Users with Spark, Flink and Druid

ITPUB

Sep 14, 2020 · Big Data

How Alibaba’s DChain Data Converger Auto‑Generates Real‑Time Wide Tables with SQL Pipelines

This article explains how the ADC (Alibaba DChain Data Converger) project automatically creates large real‑time tables by letting users configure metrics on the front‑end, then generating and publishing SQL through a pipeline that leverages design patterns, priority queues, and tree‑based data structures for efficient cross‑database processing.

Design PatternsFlinkReal-time analytics

0 likes · 15 min read

How Alibaba’s DChain Data Converger Auto‑Generates Real‑Time Wide Tables with SQL Pipelines

DataFunTalk

Sep 13, 2020 · Big Data

Online Sample Generation with Flink: Architecture and Implementation

This article explains why Flink is chosen for online sample generation, describes the end‑to‑end implementation steps—including stream union, state‑timer processing, and output formatting—covers state backend choices, monitoring, validation, fault handling, and platformization for scalable real‑time machine‑learning pipelines.

FlinkKafkaOnline Sample Generation

0 likes · 11 min read

Online Sample Generation with Flink: Architecture and Implementation

DataFunTalk

Sep 10, 2020 · Databases

Graph‑Based Real‑Time Content Update Architecture at Youku: Challenges, Design, and Practice

This technical presentation explains how Youku tackles the massive, real‑time update problem of video‑content graphs by adopting a graph‑database architecture, sub‑graph partitioning, schema‑driven logical views, and Flink‑based pipelines to achieve second‑level updates for billions of entities and attributes.

Big DataFlinkGraph Database

0 likes · 15 min read

Graph‑Based Real‑Time Content Update Architecture at Youku: Challenges, Design, and Practice

DataFunTalk

Sep 7, 2020 · Big Data

Real‑time Data Warehouse Architecture and Best Practices in Alibaba Search Recommendation

This article presents Alibaba's search‑recommendation real‑time data warehouse, describing its business background, typical use cases, key requirements, the evolution from architecture 1.0 to 2.0 with Flink and Hologres, best‑practice patterns such as row/column storage, stream‑batch integration, high‑concurrency updates, and future directions like real‑time joins and persistent dimension storage.

Big DataFlinkHologres

0 likes · 13 min read

Real‑time Data Warehouse Architecture and Best Practices in Alibaba Search Recommendation

DataFunTalk

Sep 6, 2020 · Big Data

OPPO's Real-Time Data Warehouse Architecture and Practices Based on Apache Flink

OPPO's data platform engineer Zhang Jun shares the design and implementation of OPPO's real‑time data warehouse built on Apache Flink, covering background, top‑level architecture, practical deployment, and future directions such as enhanced SQL development, resource scheduling, and automated configuration.

Data PlatformFlinkStreaming

0 likes · 15 min read

OPPO's Real-Time Data Warehouse Architecture and Practices Based on Apache Flink

DataFunTalk

Sep 1, 2020 · Big Data

NetEase Real-Time Computing Platform (Sloth): Architecture, Practices, and Future Outlook

This article introduces NetEase's real-time computing platform Sloth, detailing its architecture, component layers, integrated IDE, operational tooling, unified metadata management, challenges such as Kudu write amplification, and proposes a tiered real‑time data‑warehouse model with a vision for storage‑compute separation and unified batch‑stream APIs.

Big DataFlinkKafka

0 likes · 13 min read

NetEase Real-Time Computing Platform (Sloth): Architecture, Practices, and Future Outlook

Didi Tech

Aug 26, 2020 · Big Data

Real-time Data Warehouse Construction at Didi: Architecture, Practices, and Lessons

To support Didi’s fast‑growing car‑pool service, a real‑time data warehouse was built using a streamlined layered architecture—ODS, DWD, DIM, DWM, and APP—leveraging Flink‑based StreamSQL, Kafka, Druid and ClickHouse to deliver minute‑level analytics, dashboards, monitoring, and cross‑business interfaces while planning unified meta‑store integration.

Big Data ArchitectureData PlatformFlink

0 likes · 20 min read

Real-time Data Warehouse Construction at Didi: Architecture, Practices, and Lessons

Youzan Coder

Aug 26, 2020 · Mobile Development

How We Built a Real‑Time Crash Feedback Platform for Mobile Apps

This article details the design and implementation of a comprehensive crash feedback platform for mobile applications, covering the motivation behind replacing third‑party services, the system architecture using Flink, Kafka and HBase, crash interception on Android, automated grouping and assignment, version filtering, daily reporting, and future enhancements.

AndroidFlinkKafka

0 likes · 15 min read

How We Built a Real‑Time Crash Feedback Platform for Mobile Apps

Didi Tech

Aug 24, 2020 · Big Data

Evolution and Architecture of DiDi Data Channel Service

DiDi’s Data Channel Service evolved from a fragmented component system into a unified, SLA‑driven platform with a UI‑based Sync Center and Flink‑powered StreamSQL engine, dramatically improving task creation speed, resource utilization, and reliability while automating issue diagnosis for company‑wide real‑time and offline data synchronization.

Big DataETLFlink

0 likes · 12 min read

Evolution and Architecture of DiDi Data Channel Service

Big Data Technology & Architecture

Aug 23, 2020 · Big Data

Integrating Flink 1.11 with Hive Streaming, Kafka, and Table API

This article demonstrates how to use Flink 1.11's enhanced Hive integration to stream data from a Kafka source, write it into partitioned Hive tables with checkpoint‑driven commits, and read Hive tables as a continuous source using dynamic table options and table hints.

Big DataFlinkKafka

0 likes · 13 min read

Integrating Flink 1.11 with Hive Streaming, Kafka, and Table API

Big Data Technology & Architecture

Aug 17, 2020 · Big Data

Complex Event Processing (CEP) with Flink: Concepts, Pattern API, and a Scala Practical Example

This article introduces Complex Event Processing (CEP), explains its core concepts and features, details Flink's Pattern API with individual, combined, and group patterns, and provides a complete Scala example that detects three consecutive login failures within three seconds using Flink CEP.

Big DataCEPFlink

0 likes · 10 min read

Complex Event Processing (CEP) with Flink: Concepts, Pattern API, and a Scala Practical Example

Top Architect

Aug 14, 2020 · Big Data

Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions

This article presents a comprehensive guide for transferring massive MySQL datasets to HBase, covering environment setup on Ubuntu, three synchronization methods—MySQL LOAD DATA, a Kafka‑Thrift pipeline using Maxwell, and real‑time Flink processing—along with performance comparisons and practical tips for Hadoop, HBase, Kafka, Zookeeper, Phoenix, and related tools.

DataSyncFlinkHBase

0 likes · 24 min read

Billion‑Row MySQL to HBase Synchronization: Load Data, Kafka‑Thrift, and Flink Solutions

Architecture Digest

Aug 13, 2020 · Big Data

Synchronizing Billion-Row MySQL Data to HBase: Three Practical Schemes and Implementation Guide

This comprehensive guide details three practical methods for syncing massive MySQL datasets to HBase—including Sqoop, Kafka‑Thrift, and Flink pipelines—covering environment setup, configuration, code examples, performance comparisons, and optimization tips for large‑scale data ingestion and querying.

Big DataFlinkHBase

0 likes · 24 min read

Synchronizing Billion-Row MySQL Data to HBase: Three Practical Schemes and Implementation Guide

Big Data Technology & Architecture

Aug 10, 2020 · Big Data

Real-time Hot Item, PV, and UV Statistics Using Apache Flink, Kafka, and Bloom Filter

This article demonstrates how to implement real-time hot item ranking, page view counting, and unique visitor estimation using Apache Flink with Kafka sources, sliding windows, custom aggregation functions, and a Bloom filter backed by Redis, providing complete Scala code examples.

Big DataFlinkKafka

0 likes · 15 min read

Real-time Hot Item, PV, and UV Statistics Using Apache Flink, Kafka, and Bloom Filter

DataFunTalk

Aug 10, 2020 · Big Data

Understanding Flink SQL Architecture, Optimizations, and Internal Mechanisms

This article explains the evolution of Apache Flink's SQL support, detailing the Blink Planner architecture, the end‑to‑end Flink SQL workflow, logical and physical planning, code generation, stream‑specific optimizations such as retraction and mini‑batch, and future development directions.

Blink PlannerFlinkoptimization

0 likes · 20 min read

Understanding Flink SQL Architecture, Optimizations, and Internal Mechanisms

Big Data Technology & Architecture

Aug 8, 2020 · Big Data

Setting Up InfluxDB and Grafana for Flink Metrics Monitoring

This guide walks through installing InfluxDB and Grafana on CentOS, configuring InfluxDB for Flink metrics storage, creating databases and retention policies, integrating the Flink InfluxDB reporter, and building Grafana dashboards to visualize real‑time Flink job metrics.

Big DataFlinkGrafana

0 likes · 8 min read

Setting Up InfluxDB and Grafana for Flink Metrics Monitoring

Big Data Technology & Architecture

Aug 6, 2020 · Big Data

Flink Configuration Parameters and Related Tuning for Kafka and Yarn

This article provides a comprehensive guide to configuring Apache Flink—including job manager and task manager settings, high‑availability via Zookeeper, metrics reporting, as well as Kafka producer tuning and Yarn resource adjustments—to help practitioners optimize big‑data streaming jobs.

Big DataConfigurationFlink

0 likes · 8 min read

Flink Configuration Parameters and Related Tuning for Kafka and Yarn

DataFunTalk

Aug 4, 2020 · Artificial Intelligence

Weibo Machine Learning Platform (WML) Overview and Flink Applications

This article presents an in‑depth overview of Weibo's large‑scale machine learning platform, detailing its multi‑layer architecture, development workflow, CTR model evolution, and how Apache Flink is employed for real‑time data processing, sample services, multi‑stream joins, multimedia feature generation, and future roadmap plans.

CTRData PlatformFlink

0 likes · 12 min read

Weibo Machine Learning Platform (WML) Overview and Flink Applications

Fulu Network R&D Team

Aug 4, 2020 · Big Data

Practical Experience with State Management in Flink Real‑Time Stream Processing

This article shares practical experiences and insights on using different types of state in Apache Flink for real‑time stream processing, covering managed versus raw state, code examples in Scala and Java, handling late data, dimension table joins, distinct semantics, and best‑practice recommendations.

Big DataFlinkManaged State

0 likes · 15 min read

Practical Experience with State Management in Flink Real‑Time Stream Processing

DataFunTalk

Aug 2, 2020 · Big Data

Building Real-Time Data Warehouses with Apache Flink: Goals, Architecture, and Best Practices

This article presents a comprehensive guide to constructing real-time data warehouses using Apache Flink, covering the motivations, design principles, application scenarios, layer-by-layer architecture, metadata and lineage management, quality assurance, and the supporting toolchain for reliable streaming analytics.

Data ArchitectureETLFlink

0 likes · 24 min read

Building Real-Time Data Warehouses with Apache Flink: Goals, Architecture, and Best Practices

ITPUB

Jul 23, 2020 · Artificial Intelligence

How Likee Scales Short‑Video Recommendations with Flink, Auto‑Stats, and Cache Tensor

This article details Likee's short‑video recommendation pipeline, covering the evolution of its feature‑engineering framework, the use of Flink for minute‑level statistical and second‑level session features, the integration of automatic statistical features into DNN models, multimodal feature extraction, and the cache‑tensor technique that dramatically improves online inference performance.

AIDeep LearningFlink

0 likes · 18 min read

How Likee Scales Short‑Video Recommendations with Flink, Auto‑Stats, and Cache Tensor

DataFunTalk

Jul 22, 2020 · Big Data

Building a Real-Time Computing Platform with Apache Flink at iQIYI: Architecture, Improvements, and Business Cases

iQIYI’s senior data engineer shares the evolution of its big‑data services from Hadoop to a Flink‑based real‑time computing platform, detailing architecture, monitoring improvements, StreamingSQL capabilities, business use cases like recommendation and deep‑learning data generation, and future plans for unified stream‑batch processing.

Apache FlinkData PlatformFlink

0 likes · 11 min read

Building a Real-Time Computing Platform with Apache Flink at iQIYI: Architecture, Improvements, and Business Cases

Programmer DD

Jul 22, 2020 · Big Data

How to Sync Billions of MySQL Records to HBase: 3 Powerful Methods Using Hadoop, Kafka, and Flink

This comprehensive guide walks you through setting up a pseudo‑distributed Hadoop environment, loading massive MySQL data with LOAD DATA, Python scripts, and multithreading, and then synchronizing the data to HBase using three approaches—Sqoop, a Kafka‑Thrift pipeline, and a real‑time Kafka‑Flink pipeline—while also comparing query performance of HBase and Phoenix.

FlinkHBaseKafka

0 likes · 28 min read

How to Sync Billions of MySQL Records to HBase: 3 Powerful Methods Using Hadoop, Kafka, and Flink

Architect

Jul 15, 2020 · Big Data

Understanding Flink Task Slots, Resource Allocation, and Slot Sharing Mechanisms

This article explains how Flink uses task slots to partition TaskManager resources, the benefits of slot sharing, the interaction between Scheduler, SlotPool, and ResourceManager, and the internal classes such as LogicalSlot, PhysicalSlot, and SlotSharingManager that enable resource isolation and sharing in stream processing jobs.

Big DataFlinkResource Management

0 likes · 6 min read

Understanding Flink Task Slots, Resource Allocation, and Slot Sharing Mechanisms

Big Data Technology & Architecture

Jul 13, 2020 · Big Data

Understanding and Optimizing Flink Checkpoint Mechanism for Large-Scale State

This article explains Flink's checkpoint mechanism, outlines key performance metrics, discusses interval configuration, external state storage choices, resource allocation, and task-local recovery strategies to improve checkpoint speed and reliability in large‑scale state scenarios.

Big DataCheckpointFlink

0 likes · 5 min read

Understanding and Optimizing Flink Checkpoint Mechanism for Large-Scale State

DataFunTalk

Jul 10, 2020 · Big Data

Apache Flink Practice at NetEase: Architecture, Scale, and Future Directions

This article details NetEase's evolution from Storm to Flink for real‑time computing, describing the Sloth platform's architecture, large‑scale deployment, diverse business scenarios, monitoring, alerting, and future development plans, illustrating how Flink powers data synchronization, real‑time warehousing, and e‑commerce analytics and recommendation.

FlinkNetEaseReal-time analytics

0 likes · 15 min read

Apache Flink Practice at NetEase: Architecture, Scale, and Future Directions

Big Data Technology Architecture

Jul 8, 2020 · Big Data

Apache Flink 1.11.0 Release: New Features and Optimizations

Apache Flink 1.11.0 introduces a suite of major enhancements—including unaligned checkpoints, a unified source interface, CDC support in Table API/SQL, performance‑boosted PyFlink, a new application deployment mode, and numerous UI, Docker, and catalog improvements—aimed at increasing usability, scalability, and integration across streaming and batch workloads.

FlinkSource Interfacecheckpointing

0 likes · 18 min read

Apache Flink 1.11.0 Release: New Features and Optimizations

dbaplus Community

Jul 7, 2020 · Big Data

How Flink + ClickHouse Power Real‑Time Analytics at Scale

This article explains how FunTouTiao builds a high‑performance real‑time analytics pipeline using Flink, Hive, and ClickHouse, covering business scenarios, hour‑level and second‑level Flink‑to‑Hive architectures, streaming file sink mechanics, multi‑user permissions, ClickHouse performance tricks, and future roadmap for unified stream‑batch storage.

Big DataFlinkReal-Time

0 likes · 18 min read

How Flink + ClickHouse Power Real‑Time Analytics at Scale

Programmer DD

Jul 7, 2020 · Big Data

How to Choose a Worthwhile Technology: Depth, Ecosystem, and Evolution

The article outlines a three‑dimensional framework—technical depth, ecosystem breadth, and evolution capability—to help engineers decide which big‑data or stream‑processing technology (such as Hadoop, Spark, or Flink) is worth investing time in, and provides practical tips like using Google Trends and GitHub awesome lists.

Big DataFlinkHadoop

0 likes · 12 min read

How to Choose a Worthwhile Technology: Depth, Ecosystem, and Evolution

Architect

Jul 4, 2020 · Big Data

Kuaishou Flink Real‑Time Architecture and Spring Festival Gala Assurance Practices

This article details Kuaishou's Flink‑based real‑time computing architecture, its massive cluster scale, and the comprehensive strategies—including overload protection, system stability, pressure testing, and resource guarantees—implemented to ensure reliable streaming for the 2020 Spring Festival Gala and its real‑time dashboard.

Big DataFlinkKuaishou

0 likes · 12 min read

Kuaishou Flink Real‑Time Architecture and Spring Festival Gala Assurance Practices

DataFunTalk

Jun 30, 2020 · Big Data

Flink Real‑Time Data Warehouse Practices at Shopee Singapore Data Team

This article details Shopee Singapore Data Team’s implementation of a Flink‑based real‑time data warehouse, covering background challenges, layered architecture integrating Kafka, HBase, Druid, Hive, streaming pipelines, job management, monitoring, and future plans to expand Flink SQL support.

FlinkReal-TimeShopee

0 likes · 15 min read

Flink Real‑Time Data Warehouse Practices at Shopee Singapore Data Team

Big Data Technology Architecture

Jun 29, 2020 · Big Data

Real‑time Data Warehouse Construction: Goals, Architecture, and Best Practices with Apache Flink

This article summarizes the objectives, design principles, application scenarios, layer‑by‑layer construction methods, quality assurance mechanisms, and supporting tools for building a real‑time data warehouse using Apache Flink, providing practical guidance for data engineers and architects.

Apache FlinkData QualityFlink

0 likes · 24 min read

Real‑time Data Warehouse Construction: Goals, Architecture, and Best Practices with Apache Flink

Big Data Technology & Architecture

Jun 19, 2020 · Big Data

Comparison of Flink and Spark in Standalone and YARN Deployment Modes

This article compares Apache Flink and Apache Spark in both standalone and YARN deployment modes, detailing their architecture, job scheduling differences, and specific configurations such as Flink’s yarn‑cluster and yarn‑session modes versus Spark’s yarn‑client and yarn‑cluster modes.

Big DataComparisonFlink

0 likes · 4 min read

Comparison of Flink and Spark in Standalone and YARN Deployment Modes

DataFunTalk

Jun 18, 2020 · Big Data

Real-time Data Processing at QuTouTiao: Flink + ClickHouse Architecture and Practices

QuTouTiao leverages Flink and ClickHouse to build a high‑performance real‑time analytics platform that supports hourly Hive pipelines and sub‑second ClickHouse queries, achieving sub‑second response for 80% of requests through streaming ingestion, exactly‑once semantics, multi‑cluster coordination, and optimized ClickHouse storage and connector designs.

Big DataFlinkReal-time analytics

0 likes · 16 min read

Real-time Data Processing at QuTouTiao: Flink + ClickHouse Architecture and Practices

Big Data Technology Architecture

Jun 18, 2020 · Big Data

Understanding Data Lakes, Data Warehouses, and Real-Time Analytics with Hologres

This article analyzes the challenges of traditional data lake and warehouse architectures, explains why unified storage and compute are needed for real‑time and batch workloads, and introduces Hologres as a cloud‑native, high‑performance engine that combines PostgreSQL compatibility with Flink‑driven analytics to deliver a true real‑time data warehouse solution.

FlinkHologresReal-time analytics

0 likes · 13 min read

Understanding Data Lakes, Data Warehouses, and Real-Time Analytics with Hologres

Big Data Technology Architecture

Jun 16, 2020 · Big Data

Real-time Multi-dimensional Analytics and SlimBase State Backend at Kuaishou: Flink Applications and Optimizations

This article describes how Kuaishou leverages Apache Flink for large‑scale real‑time multi‑dimensional analytics, details the architecture of its analytics platform using Kudu storage and KwaiBI, and introduces SlimBase—a lightweight, embedded shared state backend that replaces RocksDB to reduce I/O, latency, and CPU overhead.

FlinkKuaishouKudu

0 likes · 17 min read

Real-time Multi-dimensional Analytics and SlimBase State Backend at Kuaishou: Flink Applications and Optimizations

Big Data Technology & Architecture

Jun 13, 2020 · Big Data

Hot Goods Top‑N Calculation with Flink Event‑Time Sliding Windows

This article explains how to compute the top‑N hot products or brands within a time window using Apache Flink, covering data modeling, event‑time handling, sliding windows, custom aggregation functions, and result sorting with complete Java code examples.

EventTimeFlinkStreaming

0 likes · 14 min read

Hot Goods Top‑N Calculation with Flink Event‑Time Sliding Windows

Beike Product & Technology

Jun 12, 2020 · Big Data

Design and Implementation of SQL on Streaming (SQL 1.0 → SQL 2.0) in a Real‑Time Computing Platform

This article describes the evolution of a real‑time computing platform from SQL 1.0 built on Spark Structured Streaming to SQL 2.0 powered by Flink‑SQL, covering dynamic tables, continuous queries, dimension‑table joins, cache optimization, DDL extensions, platformization, operational challenges and future roadmap.

Big DataDimension TableFlink

0 likes · 19 min read

Design and Implementation of SQL on Streaming (SQL 1.0 → SQL 2.0) in a Real‑Time Computing Platform

DataFunTalk

Jun 11, 2020 · Big Data

Real-time Multi-dimensional Analytics and SlimBase State Backend at Kuaishou: Flink Applications and Optimizations

This article presents Kuaishou's extensive use of Apache Flink for real-time multi-dimensional analytics, detailing the platform's architecture, cluster scale, data processing pipelines, the design of a shared state storage engine called SlimBase, and performance improvements achieved through replacing RocksDB with a customized HBase‑based solution.

Big DataFlinkKuaishou

0 likes · 15 min read

Architect

Jun 10, 2020 · Big Data

Understanding Flink Time Notions: ProcessTime, EventTime, IngestionTime and Watermarks with Code Examples

This article explains the three time notions supported by Apache Flink—ProcessTime, EventTime, and IngestionTime—detailing their semantics, how Watermarks enable event‑time processing, and provides Scala code samples for configuring time characteristics, assigning timestamps, and generating Watermarks in a streaming job.

EventTimeFlinkScala

0 likes · 16 min read

Understanding Flink Time Notions: ProcessTime, EventTime, IngestionTime and Watermarks with Code Examples

58 Tech

Jun 10, 2020 · Big Data

Real‑time Data Warehouse Practices at 58 Tongcheng Bao: From Spark Streaming 1.0 to Flink‑based 2.0

This article details the evolution of 58 Tongcheng Bao's real‑time data warehouse, describing the initial Spark‑Streaming architecture, its limitations, and the redesign using Flink with a layered ODS‑DWD‑DWS‑APP model, data‑quality monitoring, join techniques, and the resulting improvements in latency and accuracy.

Big DataData QualityFlink

0 likes · 9 min read

Real‑time Data Warehouse Practices at 58 Tongcheng Bao: From Spark Streaming 1.0 to Flink‑based 2.0

Big Data Technology & Architecture

Jun 4, 2020 · Big Data

Understanding Flink StreamingFileSink: File States, Rolling Policies, and Example Code

This article explains Flink's StreamingFileSink in version 1.10.0, covering how files transition through In‑progress, Pending, and Finished states, the bucket assignment and rolling policies, and provides a complete Java example for writing string data to files.

Big DataFile RollingFlink

0 likes · 6 min read

Understanding Flink StreamingFileSink: File States, Rolling Policies, and Example Code

dbaplus Community

Jun 2, 2020 · Big Data

How Cainiao Built a Scalable Real‑Time Data Warehouse with Flink

Facing growing order volumes and strict timeliness demands, Cainiao’s tech team overhauled its real‑time data warehouse by redesigning data models, adopting Flink for streaming computation, upgrading data services, and exploring innovative tools, sharing practical lessons and future directions for large‑scale logistics analytics.

Big DataFlinkLogistics

0 likes · 18 min read

How Cainiao Built a Scalable Real‑Time Data Warehouse with Flink

Architect

May 30, 2020 · Big Data

Understanding Flink’s Unified Programming API for Batch and Streaming Jobs

This article examines Apache Flink’s programming model, comparing its batch DataSet API with the streaming DataStream API, detailing class hierarchies, key code examples such as groupBy and job submission, and explaining how both paradigms are unified into a common JobGraph representation.

Batch ProcessingBig DataFlink

0 likes · 9 min read

Understanding Flink’s Unified Programming API for Batch and Streaming Jobs

Architect

May 29, 2020 · Artificial Intelligence

Integrating Flink with TensorFlow for End-to-End Machine Learning Pipelines

This article explains how to combine the Flink data‑processing engine with TensorFlow to create a unified, end‑to‑end machine‑learning workflow, covering background, challenges, the Flink‑AI‑extended architecture, ML framework and operator abstractions, and both batch and streaming training and prediction modes.

AI integrationDistributed TrainingFlink

0 likes · 9 min read

Integrating Flink with TensorFlow for End-to-End Machine Learning Pipelines

Huolala Tech

May 28, 2020 · Big Data

How Flink Powers Real‑Time Risk Control at HuoLaLa: Architecture and Insights

This article explains Flink's role in HuoLaLa's risk‑control system, covering its background, the Lambda‑style architecture that combines batch and streaming, the real‑time data pipeline, machine‑learning models, and operational safeguards that together enable proactive fraud detection.

Big Data ArchitectureFlinkLambda architecture

0 likes · 16 min read

How Flink Powers Real‑Time Risk Control at HuoLaLa: Architecture and Insights

DataFunTalk

May 14, 2020 · Big Data

Building a Real-Time Data Warehouse at Cainiao: Architecture, Model Upgrades, Engine Enhancements, and Service Innovations

This article shares Cainiao's practical experience in constructing a real-time data warehouse, covering the shortcomings of the previous architecture, the evolution of data models, the migration to Flink with advanced features like retraction and timer services, and the modernization of data services and tooling to support high‑throughput logistics scenarios.

Big DataData ServiceFlink

0 likes · 16 min read

Building a Real-Time Data Warehouse at Cainiao: Architecture, Model Upgrades, Engine Enhancements, and Service Innovations

Big Data Technology & Architecture

May 14, 2020 · Big Data

Understanding Flink 1.10 TaskManager Memory Model and Configuration Parameters

This article explains the new unified TaskManager memory model introduced in Flink 1.10, detailing each memory component, its configuration parameters, how they map to JVM settings, and practical guidance for both standalone and containerized deployments, including a concrete YARN example.

BatchBig DataFlink

0 likes · 10 min read

Understanding Flink 1.10 TaskManager Memory Model and Configuration Parameters

DataFunTalk

May 11, 2020 · Big Data

Designing a Real-Time Data System with Flink: Architecture, Data Modeling, and UV Metric Computation

This article outlines a comprehensive real‑time data system built on Apache Flink, covering its application scenarios, layered architecture, data model stratification, construction methods, and a concrete Flink SQL example for calculating UV metrics from Kafka‑sourced page‑view data.

Data ArchitectureFlinkKafka

0 likes · 24 min read

Designing a Real-Time Data System with Flink: Architecture, Data Modeling, and UV Metric Computation

Big Data Technology & Architecture

May 8, 2020 · Big Data

Understanding ProcessFunction and CoProcessFunction in Apache Flink

This article explains Apache Flink's ProcessFunction and CoProcessFunction, detailing their use of events, state, and timers, compares event‑time and processing‑time semantics, and provides a complete Java example illustrating timer registration, onTimer handling, and debugging observations.

CoProcessFunctionEventTimeFlink

0 likes · 11 min read

Understanding ProcessFunction and CoProcessFunction in Apache Flink

21CTO

Apr 30, 2020 · Big Data

How to Choose a Worthwhile Technology: A Big Data Engineer’s 3‑Step Framework

The article outlines a three‑dimensional framework—technical depth, ecosystem breadth, and evolution capability—to help professionals evaluate whether a technology is worth investing time in, illustrated with real‑world examples from Hadoop, Spark, and Flink.

Big DataFlinkHadoop

0 likes · 10 min read

How to Choose a Worthwhile Technology: A Big Data Engineer’s 3‑Step Framework

Big Data Technology Architecture

Apr 15, 2020 · Big Data

Real-Time Data Warehouse Practices: Case Studies from Meituan, NetEase, Zhihu, and OPPO

This article reviews the evolution of data warehouses from traditional offline models to modern real‑time architectures, presenting detailed case studies of Meituan, NetEase, Zhihu, and OPPO, and discusses layer designs, technology choices such as Flink, Kafka, and storage options, and key lessons for building scalable real‑time warehouses.

Big DataFlinkKafka

0 likes · 13 min read

Real-Time Data Warehouse Practices: Case Studies from Meituan, NetEase, Zhihu, and OPPO

Dada Group Technology

Apr 15, 2020 · Big Data

Practice Experience of Dada Group's Real-Time Computation SQLization Using Dada Flink SQL

This article details Dada Group's development of the Dada Flink SQL engine, describing its background, architecture, parser design, dimension‑table join strategies, numerous enhancements such as HA support, Kafka keyword handling, metadata integration, Redis and ClickHouse sinks, BINLOG simplification, and future migration plans toward Flink 1.10.

FlinkReal‑Time ComputingSQL Engine

0 likes · 12 min read

Practice Experience of Dada Group's Real-Time Computation SQLization Using Dada Flink SQL

Big Data Technology & Architecture

Apr 12, 2020 · Big Data

Understanding Spark and Flink RPC Implementations: A Code Reading Guide

This article explains how to read and compare the RPC implementations of Spark and Flink, covering Actor Model concepts, Akka integration, message handling, threading models, and practical code‑reading techniques while providing detailed code excerpts and architectural analysis.

Distributed SystemsFlinkRPC

0 likes · 32 min read

Understanding Spark and Flink RPC Implementations: A Code Reading Guide

Big Data Technology & Architecture

Apr 8, 2020 · Big Data

Common Apache Flink Exceptions and How to Resolve Them

This article enumerates typical Apache Flink deployment, job, and checkpoint errors—such as JDK version issues, resource shortages, task manager timeouts, and state migration problems—and provides practical troubleshooting steps and configuration tips to help engineers quickly diagnose and fix these failures.

Big DataCheckpointException

0 likes · 8 min read

Common Apache Flink Exceptions and How to Resolve Them

DataFunTalk

Mar 28, 2020 · Big Data

Applying Flink State Management for Real-Time Recommendation Scenarios

This article explains how Apache Flink's flexible state management can be leveraged to solve data correlation challenges in real‑time recommendation platforms, compares Flink with Spark and Storm, describes the underlying broadcast and managed state mechanisms, and provides a step‑by‑step implementation using Kafka, Druid, and custom broadcast functions.

Big DataFlinkReal-Time

0 likes · 14 min read

Applying Flink State Management for Real-Time Recommendation Scenarios

Alibaba Cloud Developer

Mar 19, 2020 · Big Data

Can Flink Unify Real‑Time and Offline Data Warehouses? A Deep Dive

This article examines the challenges of maintaining separate offline and real‑time data warehouses, explains the three‑layer ODS‑DW‑ADS model, evaluates the traditional Lambda architecture, and explores how a unified Flink stack with Kafka, HiveCatalog and streaming sinks can simplify metadata, SQL development, data import/export, and stateful processing for both batch and streaming workloads.

FlinkLambda architectureReal-Time

0 likes · 12 min read

Can Flink Unify Real‑Time and Offline Data Warehouses? A Deep Dive

Top Architect

Mar 13, 2020 · Big Data

Three Billion‑Scale MySQL‑to‑HBase Synchronization Solutions and Practical Implementation

This article presents a comprehensive guide for synchronizing massive MySQL datasets to HBase, covering environment preparation, fast MySQL data loading techniques, and three practical pipelines—Sqoop, Kafka‑Thrift, and Kafka‑Flink—along with performance comparisons and optimization tips for large‑scale data processing.

Big DataFlinkHBase

0 likes · 24 min read

Three Billion‑Scale MySQL‑to‑HBase Synchronization Solutions and Practical Implementation

Big Data Technology & Architecture

Mar 13, 2020 · Operations

Configuring Logback as the Logging Framework for Apache Flink

This article explains how to replace Flink's default Log4j logger with Logback by adding Maven dependencies, excluding transitive Log4j artifacts, updating Flink's lib directory, and customizing Logback XML configurations including a rolling file appender and optional email alerts.

ConfigurationFlinkjava

0 likes · 8 min read

Configuring Logback as the Logging Framework for Apache Flink

DataFunTalk

Mar 8, 2020 · Big Data

Real-Time Log Monitoring and Alerting System for iQIYI Membership Services

This article describes how iQIYI built a real‑time, multi‑dimensional log monitoring platform using Spark Streaming, Flink, Kafka and Druid to handle billions of logs, improve alerting accuracy, reduce incident response time, and outline future intelligent monitoring enhancements.

DruidFlinkLog Analytics

0 likes · 10 min read

Real-Time Log Monitoring and Alerting System for iQIYI Membership Services

iQIYI Technical Product Team

Mar 6, 2020 · Big Data

Real-Time Log Monitoring and Alerting for iQIYI Membership Services

To support over 100 million iQIYI members, the team rebuilt a real‑time log monitoring platform that gathers access, exception, Nginx and front‑end logs via a Venus‑Agent, streams them through Kafka to Spark Streaming and Flink, stores metrics in Druid, and provides minute‑level host and business alerts, achieving 80 % faster incident investigation, detecting 90 % of member complaints early, and generating more than 4,800 actionable alerts.

Big DataFlinkLog Analytics

0 likes · 11 min read

Real-Time Log Monitoring and Alerting for iQIYI Membership Services

58 Tech

Mar 4, 2020 · Big Data

Applying Flink State Management to Real‑Time Recommendation Scenarios

This article explains how Flink's flexible state management, including Broadcast, Keyed, and Operator states, can be used to solve real‑time recommendation challenges such as per‑minute UV, click, and exposure counting, while addressing locality mapping and data‑delay issues with Druid as the downstream store.

Broadcast StateDruidFlink

0 likes · 13 min read

Applying Flink State Management to Real‑Time Recommendation Scenarios

Big Data Technology & Architecture

Feb 22, 2020 · Big Data

Understanding Flink's Asynchronous Barrier Snapshot (ABS) Algorithm for Checkpointing

This article explains how Apache Flink implements fault‑tolerant checkpointing using the Asynchronous Barrier Snapshot (ABS) algorithm, a localized version of the Chandy‑Lamport distributed snapshot, covering barriers, snapshot alignment, exactly‑once versus at‑least‑once semantics, and handling of cyclic dataflow graphs.

Asynchronous Barrier SnapshotDistributed SystemsFlink

0 likes · 9 min read

Understanding Flink's Asynchronous Barrier Snapshot (ABS) Algorithm for Checkpointing

Ctrip Technology

Feb 20, 2020 · Big Data

Ctrip Flight Ticket Data Warehouse: Architecture, Technology Stack, and Practical Practices

This article outlines Ctrip's flight ticket data warehouse evolution, current big‑data technology stack, data synchronization methods, layered architecture, quality monitoring system, and a real‑time price anomaly detection case, providing practical insights for building scalable, reliable data warehousing solutions.

CtripData QualityETL

0 likes · 20 min read

Ctrip Flight Ticket Data Warehouse: Architecture, Technology Stack, and Practical Practices

DataFunTalk

Feb 19, 2020 · Big Data

Design and Integration of Flink Batch Processing with Hive: Architecture, Features, and Performance Evaluation

This article presents the design of Flink's batch processing architecture, its integration with Hive through a unified Catalog API, details the enhancements in Flink 1.10, outlines future work, and reports a performance test showing roughly seven‑fold speedup over Hive on MapReduce.

Batch ProcessingBig DataCatalog API

0 likes · 9 min read

Design and Integration of Flink Batch Processing with Hive: Architecture, Features, and Performance Evaluation

Alibaba Cloud Developer

Feb 19, 2020 · Artificial Intelligence

How Flink Is Powering Real‑Time AI: From Lambda Architecture to Stream‑Batch Unification

This article examines how Apache Flink embraces AI by leveraging the Lambda architecture and stream‑batch unification to enable real‑time data processing across preprocessing, model training, and inference, discusses the challenges of model updates and code maintenance, and outlines ongoing Flink initiatives that support AI real‑timeization.

AIFlink

0 likes · 15 min read

How Flink Is Powering Real‑Time AI: From Lambda Architecture to Stream‑Batch Unification

dbaplus Community

Feb 18, 2020 · Big Data

Building RAP: iQIYI’s Real‑Time Big Data Analytics Platform with Druid, Spark & Flink

The article details iQIYI’s RAP platform, describing its real‑time analytics requirements, architectural evolution from RAP 1.x to 2.x, core design steps, integration of Druid, Spark, Flink, and KIS, and showcases business use cases such as membership monitoring, recommendation evaluation, and smart‑TV alerting.

DruidFlinkOLAP

0 likes · 14 min read

Building RAP: iQIYI’s Real‑Time Big Data Analytics Platform with Druid, Spark & Flink

Big Data Technology & Architecture

Feb 18, 2020 · Big Data

Understanding Flink Event‑Time Windows, Watermarks, and Allowed Lateness with Scala Examples

This article explains how Apache Flink uses event‑time windows, watermarks, and allowed lateness to handle out‑of‑order and late data, and provides complete Scala code examples that demonstrate timestamp assignment, watermark generation, window triggering, and side‑output of late records.

AllowedLatenessEventTimeFlink

0 likes · 17 min read

Understanding Flink Event‑Time Windows, Watermarks, and Allowed Lateness with Scala Examples

Big Data Technology & Architecture

Feb 16, 2020 · Big Data

Implementing User Purchase Behavior Tracking with Flink Broadcast State

This article explains how to use Flink's Broadcast State to track user purchase paths in real time, detailing the design, required Kafka streams, Java APIs, state management, dynamic configuration, code implementation, deployment steps, and example results for a big‑data streaming application.

Big DataBroadcast StateFlink

0 likes · 19 min read

Implementing User Purchase Behavior Tracking with Flink Broadcast State

Big Data Technology & Architecture

Feb 15, 2020 · Big Data

Understanding Event Time and Watermarks in Apache Flink

This article explains how Apache Flink uses event‑time timestamps and watermarks to handle out‑of‑order and late data, describes the assignTimestampsAndWatermarks API with periodic and punctuated watermark assigners, and provides practical code examples for window lateness and side‑output handling.

Apache FlinkEvent TimeFlink

0 likes · 10 min read

Understanding Event Time and Watermarks in Apache Flink

Big Data Technology Architecture

Feb 13, 2020 · Big Data

Evolution of Cainiao's Real-Time Data Warehouse Architecture: Model, Compute Engine, and Data Service Upgrades

The talk details Cainiao’s evolution of its real‑time data warehouse architecture, covering the original 2016 model, compute and service challenges, the 2017 multi‑layer data model redesign, migration to Flink, practical cases of state retraction, timeout statistics, smart optimizations, and the unified data service platform.

Data ServiceFlinkStreaming

0 likes · 16 min read

Evolution of Cainiao's Real-Time Data Warehouse Architecture: Model, Compute Engine, and Data Service Upgrades

Xianyu Technology

Feb 11, 2020 · Big Data

Client-side Complex Event Processing with Flink CEP and Python

The article describes how Xianyu’s recommendation system shifts complex event processing from server‑side Blink to client‑side Python using Flink CEP concepts, detailing the NFA‑based state and transition model, pattern‑building API, aggregation support, achieving sub‑second execution with modest memory, and outlines future optimizations such as NFA persistence, windowing, DSL script generation, and C++/TensorFlow Lite acceleration.

CEPClientSideFlink

0 likes · 13 min read

Client-side Complex Event Processing with Flink CEP and Python

DataFunTalk

Feb 10, 2020 · Artificial Intelligence

Real‑Time Intelligent Anomaly Detection Platform at Ctrip: Integrating Flink and TensorFlow (Prophet)

The article describes Ctrip's Prophet platform, which combines Flink real‑time stream processing with TensorFlow deep‑learning models to provide intelligent, low‑latency anomaly detection, replacing traditional rule‑based alerts and addressing challenges such as holiday traffic and model scalability.

AIDeep LearningFlink

0 likes · 13 min read

Real‑Time Intelligent Anomaly Detection Platform at Ctrip: Integrating Flink and TensorFlow (Prophet)

Big Data Technology Architecture

Feb 8, 2020 · Big Data

Meituan-Dianping Real-Time Data Warehouse Platform Built on Apache Flink: Architecture, Practices, and Future Directions

Meituan-Dianping’s senior technical expert shares the evolution, architecture, and implementation of their Apache Flink‑based real‑time data warehouse platform, covering platform evolution, layered design, job and resource management, business warehouse use cases, and future development considerations.

FlinkMeituan-DianpingStreaming

0 likes · 16 min read

Meituan-Dianping Real-Time Data Warehouse Platform Built on Apache Flink: Architecture, Practices, and Future Directions

Big Data Technology & Architecture

Feb 5, 2020 · Big Data

Resolving Oozie Shell Scheduling Issues for Flink Jobs on CDH 6.3 with Kerberos Authentication

The article describes how to troubleshoot and fix Oozie shell‑action failures when submitting Flink jobs on a CDH 6.3 cluster with Kerberos, detailing environment‑variable conflicts, error messages, and the final solution using a clean environment and custom FLINK_CONF_DIR settings.

Big DataCDHFlink

0 likes · 7 min read

Resolving Oozie Shell Scheduling Issues for Flink Jobs on CDH 6.3 with Kerberos Authentication

Big Data Technology Architecture

Jan 29, 2020 · Big Data

Xiaomi Streaming Platform: Evolution, Architecture, and Flink‑Based Real‑Time Data Warehouse

The article details Xiaomi's unified streaming data platform, its three‑generation evolution from Scribe/Kafka/Storm to Talos and Flink, the current architecture supporting billions of records daily, and future plans to unify offline and real‑time warehousing with Flink SQL.

FlinkReal-time ProcessingStreaming Platform

0 likes · 15 min read

Xiaomi Streaming Platform: Evolution, Architecture, and Flink‑Based Real‑Time Data Warehouse

DataFunTalk

Jan 22, 2020 · Big Data

Real-Time Data Engineering Practices for Alibaba 1688 Business

This article explains how Alibaba 1688 achieves real‑time recommendation, advertising, and product statistics through a robust middle‑platform foundation, streaming engines like Blink, data synchronization tools, and scalable storage, illustrating three concrete engineering cases and the end‑to‑end real‑time data service pipeline.

AlibabaFlinkstream processing

0 likes · 8 min read

Real-Time Data Engineering Practices for Alibaba 1688 Business

Alibaba Cloud Developer

Jan 20, 2020 · Big Data

Alibaba’s Secrets to High‑Throughput Full‑Load and Low‑Latency Search Processing

This article details how Alibaba migrated its massive Taobao‑Tmall search workload to the search offline platform, tackling challenges of massive data volume, one‑to‑many joins, and hotspot sellers through a series of performance optimizations—including local joins, salt‑based data sharding, dynamic aggregation jobs, and asynchronous processing—to achieve high‑throughput full loads and low‑latency incremental updates.

AlibabaBig DataFlink

0 likes · 15 min read

Alibaba’s Secrets to High‑Throughput Full‑Load and Low‑Latency Search Processing

DataFunTalk

Jan 19, 2020 · Big Data

Xiaomi Streaming Platform: Evolution, Architecture, and Flink‑Based Real‑Time Data Warehouse

The article presents a comprehensive overview of Xiaomi's streaming platform, detailing its three‑stage evolution from a Scribe‑Kafka‑Storm stack to a Flink‑driven real‑time data warehouse, describing its architecture, components, challenges, migration strategies, job and SQL management, and future roadmap.

FlinkXiaomi

0 likes · 15 min read

360 Tech Engineering

Jan 16, 2020 · Big Data

Real-Time and Offline Integrated Solution for Channel Analysis Data Processing

This article presents a comprehensive real‑time and offline integrated solution for a channel analysis system, detailing challenges, architecture, implementation using Flink, Spark Streaming, Kafka, Elasticsearch, and HIVE, and demonstrating minute‑level latency and high accuracy through performance evaluations.

Big DataElasticsearchFlink

0 likes · 10 min read

Real-Time and Offline Integrated Solution for Channel Analysis Data Processing

dbaplus Community

Jan 14, 2020 · Big Data

How OPPO Built a Real‑Time Data Warehouse with Flink SQL

This article details{32-64 words} OPPO's evolution from an offline data warehouse to a real‑time platform, describing the business scale, data‑mid platform architecture, migration strategy using Flink SQL, extensions like AthenaX, and practical use cases such as real‑time ETL, CTR calculation, and tag import.

ETLFlinkStreaming

0 likes · 18 min read

How OPPO Built a Real‑Time Data Warehouse with Flink SQL

Big Data Technology & Architecture

Jan 13, 2020 · Big Data

130 Essential Big Data and Distributed Systems Interview Questions

This article compiles 130 interview questions spanning big data technologies, distributed systems, and core computer science concepts to help candidates prepare for technical interviews, offering a comprehensive resource for self‑study and review.

FlinkHadoopKafka

0 likes · 12 min read

130 Essential Big Data and Distributed Systems Interview Questions

DataFunTalk

Jan 10, 2020 · Big Data

Design and Evolution of iQIYI's Real-Time Analytics Platform (RAP)

The article details iQIYI's Real-Time Analysis Platform (RAP), describing its motivation, architecture evolution from RAP 1.x to 2.x, OLAP engine selection, product design workflow, integration of Druid KIS and Flink, enhanced diagnostics, and real-world applications in membership monitoring, recommendation evaluation, and smart TV alerting.

DruidFlinkOLAP

0 likes · 12 min read

Design and Evolution of iQIYI's Real-Time Analytics Platform (RAP)

Big Data Technology & Architecture

Jan 10, 2020 · Big Data

Async I/O for Dimension Table Joins in Apache Flink

This article explains how to handle dimension table joins in Apache Flink streaming by leveraging Async I/O to perform non‑blocking external lookups, provides detailed code examples for both synchronous and asynchronous functions, discusses configuration parameters, and outlines best practices and pitfalls.

Big DataDimension Table JoinFlink

0 likes · 16 min read

Async I/O for Dimension Table Joins in Apache Flink

iQIYI Technical Product Team

Jan 9, 2020 · Big Data

Design and Evolution of iQIYI Real-Time Analysis Platform (RAP)

iQIYI’s Real‑Time Analysis Platform (RAP) combines Apache Druid with Spark/Flink to deliver minute‑level, low‑latency multidimensional analytics via a web wizard, supporting hundreds of streaming tasks and thousands of reports across membership, recommendation, and TV monitoring, while simplifying development and maintenance.

Apache DruidBig DataFlink

0 likes · 13 min read

Design and Evolution of iQIYI Real-Time Analysis Platform (RAP)

Big Data Technology & Architecture

Jan 8, 2020 · Big Data

Real-Time Data Warehouse Architecture and Challenges Using Flink, Kafka, and HBase

This article examines the design of a real-time data warehouse built on Flink, Kafka, and HBase, compares it with traditional offline warehouses, and discusses key challenges such as data accuracy, latency, and the complexity of maintaining real-time dimension tables.

Big DataFlinkHBase

0 likes · 10 min read

Real-Time Data Warehouse Architecture and Challenges Using Flink, Kafka, and HBase

Tongcheng Travel Technology Center

Jan 7, 2020 · Big Data

Design and Implementation of XFlink: A Flink‑Based Data Migration System on Yarn

The article describes the evolution from the legacy XDATA tool to the new XFlink system, detailing its architecture, core plugins, parser and deployment modules, resource management with Yarn, monitoring via Prometheus and Grafana, and planned enhancements such as Flink SQL configuration and modular plugins.

Big DataData MigrationDistributed Systems

0 likes · 10 min read

Design and Implementation of XFlink: A Flink‑Based Data Migration System on Yarn

dbaplus Community

Jan 6, 2020 · Big Data

How 58.com Built a Scalable Flink‑Based Real‑Time Data Platform (Wstream)

The article details how 58.com designed and evolved its one‑stop real‑time computation platform Wstream, migrating from Storm and Spark Streaming to Apache Flink, and describes the architecture, task isolation, stream‑SQL features, monitoring, and ongoing optimizations that enable processing of over 600 billion records daily.

Big DataFlinkReal-time Streaming

0 likes · 12 min read

How 58.com Built a Scalable Flink‑Based Real‑Time Data Platform (Wstream)

Big Data Technology & Architecture

Dec 25, 2019 · Big Data

Understanding Flink StreamPartitioner and Its Implementations

Flink’s StreamPartitioner abstracts data routing in DataStream, offering eight built‑in partitioners—including Global, Shuffle, Rebalance, KeyGroup, Broadcast, Rescale, Forward, and Custom—each with distinct channel selection logic, illustrated with source code snippets and explanations of their runtime behavior.

Big DataDataStreamFlink

0 likes · 8 min read

Understanding Flink StreamPartitioner and Its Implementations

Qunar Tech Salon

Dec 20, 2019 · Big Data

Understanding Flink Cluster Startup and Job Execution Process

This article explains the architecture of a Flink cluster, detailing the startup procedures for JobManager and TaskManager, the three deployment modes, and the end‑to‑end flow of a Flink job from client code through StreamGraph, JobGraph, ExecutionGraph to the physical execution on TaskManagers.

Big DataCluster ArchitectureFlink

0 likes · 10 min read

Understanding Flink Cluster Startup and Job Execution Process