Tagged articles

946 articles

Page 5 of 10

Aug 29, 2022 · Big Data

Migrating from Lambda Architecture to an Iceberg‑Based Unified Batch‑Stream Architecture at NetEase Yanxuan

This article details how NetEase Yanxuan upgraded its legacy Lambda data pipeline to a unified batch‑stream architecture built on Apache Iceberg, covering the original challenges, the evaluation of Iceberg versus Hudi and DeltaLake, implementation specifics, table‑governance techniques, and future roadmap.

Batch-StreamData LakeFlink

0 likes · 14 min read

Migrating from Lambda Architecture to an Iceberg‑Based Unified Batch‑Stream Architecture at NetEase Yanxuan

DataFunSummit

Aug 25, 2022 · Big Data

Managing the Full Lifecycle of Risk Features: Pitfalls, Solutions, and Future Directions

The talk by Tang Gengyang from Citic Baixin Bank details the challenges faced in risk feature engineering, presents two solution frameworks (1.0 and 2.0) for accelerating deployment, improving reuse, handling offline/online consistency, and outlines future enhancements for a more efficient, automated feature pipeline.

Flinkasynchronous processingdata pipelines

0 likes · 14 min read

Managing the Full Lifecycle of Risk Features: Pitfalls, Solutions, and Future Directions

Alibaba Cloud Big Data AI Platform

Aug 25, 2022 · Big Data

How Alibaba Cloud Flink + Hologres Power Real‑Time Data Warehouses

This article explains how Alibaba Cloud Flink and Hologres combine to deliver a one‑stop, cloud‑native real‑time data‑warehouse solution that supports low‑latency ingestion, full‑incremental CDC, automatic schema evolution, high‑performance OLAP and online serving, and simplifies ETL/ELT pipelines for enterprise analytics.

FlinkHologrescloud computing

0 likes · 25 min read

How Alibaba Cloud Flink + Hologres Power Real‑Time Data Warehouses

Zhuanzhuan Tech

Aug 24, 2022 · Big Data

Real-Time Data Warehouse Architecture Using Flink: Design, Implementation, and Challenges

This article details the design and implementation of a real‑time data warehouse for an advertising platform, covering business background, challenges, a Lambda‑based architecture, Flink stream processing setup, ETL logic, sink handling, and performance results, concluding with future improvement directions.

ETLFlinkLambda architecture

0 likes · 11 min read

Real-Time Data Warehouse Architecture Using Flink: Design, Implementation, and Challenges

Big Data Technology & Architecture

Aug 23, 2022 · Big Data

Using Flink Broadcast State for Dynamic Configuration Updates and Real‑Time Data Enrichment

This article explains how Flink's Broadcast State feature can be used to dynamically update processing rules and enrich streaming events with user information from MySQL, showing configuration, code examples, key considerations, and runtime results that demonstrate real‑time adaptability without restarting the job.

Broadcast StateDynamic ConfigurationFlink

0 likes · 15 min read

Using Flink Broadcast State for Dynamic Configuration Updates and Real‑Time Data Enrichment

Big Data Technology Architecture

Aug 23, 2022 · Big Data

Apache Hudi 0.12.0 Release Highlights: Presto Connector, Archive Beyond Savepoint, File‑System Locks, Deltastreamer Termination, Spark & Flink Support, Performance Improvements, and Configuration Updates

The Apache Hudi 0.12.0 release introduces a native Presto connector, archive‑beyond‑savepoint capability, file‑system based locking, new deltastreamer termination strategies, expanded Spark and Flink support, numerous performance enhancements, and a series of configuration and API updates for better data‑lake management.

Apache HudiFlinkPresto

0 likes · 12 min read

Apache Hudi 0.12.0 Release Highlights: Presto Connector, Archive Beyond Savepoint, File‑System Locks, Deltastreamer Termination, Spark & Flink Support, Performance Improvements, and Configuration Updates

Big Data Technology & Architecture

Aug 22, 2022 · Big Data

Apache DolphinScheduler 3.0.0 Release Highlights and New Features

The Apache DolphinScheduler 3.0.0 release on August 10, 2022 introduces a faster UI, stronger data‑quality guarantees, modernized design, easier maintenance, AWS support, service splitting, and native Flink task support, accompanied by detailed code examples and download links.

Apache DolphinSchedulerBig DataData Quality

0 likes · 11 min read

Apache DolphinScheduler 3.0.0 Release Highlights and New Features

Volcano Engine Developer Services

Aug 15, 2022 · Big Data

How ByteDance Scales Event Tracking: Inside a Billion‑Events‑Per‑Second Data Pipeline

This article explains how ByteDance’s event‑tracking (埋点) data flow handles billions of events per second using Flink‑based real‑time ETL, dynamic rule engines, data sharding, and multi‑datacenter disaster‑recovery to ensure stability, low latency, and cost‑effective processing for diverse downstream services.

Big DataFlinkScalability

0 likes · 16 min read

How ByteDance Scales Event Tracking: Inside a Billion‑Events‑Per‑Second Data Pipeline

Big Data Technology & Architecture

Aug 15, 2022 · Big Data

Comprehensive Guide to Flink Partitioners and Their Implementations

This article explains the eight built‑in Flink partitioners, their distribution strategies, key implementation details, and provides Java code examples illustrating how each partitioner selects downstream channels and determines pointwise or all‑to‑all distribution.

Big DataFlinkPartitioner

0 likes · 9 min read

Comprehensive Guide to Flink Partitioners and Their Implementations

ITPUB

Aug 13, 2022 · Big Data

How Alibaba Uses Flink to Power Massive Real‑Time Risk Control

This article explains how Alibaba leverages Flink to handle over 40 billion events per second across all business units, detailing risk‑control concepts, rule types, architectural stages, resource tuning, dynamic CEP, shared computing, and the FY23 roadmap for large‑scale streaming risk management.

AlibabaBig DataCEP

0 likes · 16 min read

How Alibaba Uses Flink to Power Massive Real‑Time Risk Control

DaTaobao Tech

Aug 11, 2022 · Big Data

Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing

The article describes how fragmented real‑time, batch, and online data‑warehouse pipelines suffer from low productivity and inconsistent data quality, and introduces a unified SQL engine built on Apache Calcite that parses, optimizes, and compiles a single SQL statement into executable plans for ODPS, Flink, or Java, leveraging Janino code generation, multi‑backend state storage, and snapshot‑join semantics to boost performance and simplify development.

Batch ProcessingCalciteFlink

0 likes · 16 min read

Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing

DataFunTalk

Aug 6, 2022 · Big Data

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

This article details Xiaohongshu's data platform engineering, describing how Apache Iceberg is leveraged for real‑time data lake ingestion, CDC pipelines, multi‑cloud storage, small‑file mitigation, schema evolution, and future plans across storage, compute, and management within a big‑data ecosystem.

Apache IcebergCDCFlink

0 likes · 16 min read

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

Alibaba Cloud Big Data AI Platform

Aug 4, 2022 · Big Data

Boost Real‑Time Data Warehouses with Integrated Analytics & Service

Alibaba Cloud’s Hologres unifies analytical and service workloads in a real‑time data warehouse, simplifying data exchange, reducing development and operational costs, and delivering high‑performance, low‑latency online services through innovations like row‑column hybrid storage, hot upgrades, and elastic cloud‑native scaling, as demonstrated in a logistics case study.

FlinkHologresReal-Time

0 likes · 13 min read

Boost Real‑Time Data Warehouses with Integrated Analytics & Service

dbaplus Community

Jul 27, 2022 · Operations

Visualizing Full‑Link Log Tracing: From Design to Meituan Content Platform

This article presents a visual full‑link log tracing solution that organizes business logs by execution chain, enabling efficient log collection, dynamic linking, and real‑time visualization to pinpoint issues in complex distributed systems, with a detailed case study from Meituan's content platform.

DSLFlinkHBase

0 likes · 24 min read

Visualizing Full‑Link Log Tracing: From Design to Meituan Content Platform

Big Data Technology & Architecture

Jul 27, 2022 · Big Data

Step-by-Step Guide to Installing and Using Flink with Iceberg for Real-Time Data Lake

This article provides a comprehensive tutorial on setting up Flink 1.11 with Iceberg 0.11.1, creating Hive catalogs, building databases and tables, inserting data, and exploring Iceberg components, file structures, partitioned tables, execution plans, and programmatic access via Scala.

Big DataData LakeFlink

0 likes · 10 min read

Step-by-Step Guide to Installing and Using Flink with Iceberg for Real-Time Data Lake

DataFunTalk

Jul 26, 2022 · Big Data

Feature Platform Architecture and Stream‑Batch Integrated Solutions

This talk presents Shuhe Technology’s feature platform, detailing its four‑layer architecture, feature storage services, stream‑batch integrated processing, event‑center design, consistency models, and four model‑strategy invocation schemes, illustrating data flows from MySQL through Sqoop, Kafka, Flink, HBase and ClickHouse.

Big DataFlinkHBase

0 likes · 17 min read

Feature Platform Architecture and Stream‑Batch Integrated Solutions

Alibaba Cloud Developer

Jul 26, 2022 · Big Data

How NetEase Game Built StreamflySQL: From Client‑Side to Server‑Side Flink SQL

This article recounts NetEase Game's evolution of its real‑time computation platform Streamfly, detailing the transition from a client‑side Flink SQL solution (StreamflySQL v1) to a server‑side architecture using SQL Gateway (StreamflySQL v2), the challenges faced, and future work.

FlinkJob ManagementServer-side Compilation

0 likes · 21 min read

How NetEase Game Built StreamflySQL: From Client‑Side to Server‑Side Flink SQL

JavaEdge

Jul 25, 2022 · Big Data

Choosing Between Lambda and Kappa: Real‑Time Data Warehouse Strategies

The article uses an acorn‑moving analogy to highlight latency and traceability challenges in enterprise data warehouses, then explains offline versus real‑time approaches, compares Lambda and Kappa architectures, discusses Iceberg integration, and shares a detailed e‑commerce real‑time warehouse case study with optimization tips.

Big DataFlinkIceberg

0 likes · 15 min read

Choosing Between Lambda and Kappa: Real‑Time Data Warehouse Strategies

Big Data Technology & Architecture

Jul 25, 2022 · Big Data

Understanding Flink Join Types, Optimizations, and Physical Plan Translation

This article explains the different join types supported by Apache Flink—including regular, interval, temporal, and lookup joins—provides SQL examples, details how the Flink optimizer transforms logical plans into efficient physical plans, and describes the underlying code generation and execution mechanisms.

Big DataFlinkJOIN

0 likes · 14 min read

Understanding Flink Join Types, Optimizations, and Physical Plan Translation

ITPUB

Jul 22, 2022 · Big Data

From Client‑Side to Server‑Side: How NetEase Built StreamflySQL on Flink SQL

This article chronicles NetEase Games' evolution of its real‑time StreamflySQL platform, detailing the transition from a client‑side Flink SQL implementation to a server‑side architecture powered by SQL Gateway, and discusses the motivations, design choices, challenges, and performance improvements achieved.

Big DataFlinkSQL Gateway

0 likes · 19 min read

From Client‑Side to Server‑Side: How NetEase Built StreamflySQL on Flink SQL

HomeTech

Jul 20, 2022 · Big Data

Design and Implementation of a Real-Time Advertising Data Warehouse Using Flink and StarRocks

This article presents a comprehensive case study of building a real‑time advertising data warehouse at Auto Home, detailing the evaluation of streaming engines and storage solutions, the layered architecture design, implementation steps with Flink and StarRocks, monitoring practices, encountered issues, and future roadmap, demonstrating how second‑level data freshness and high accuracy were achieved.

FlinkStarRocksStreaming

0 likes · 10 min read

StarRocks

Jul 18, 2022 · Big Data

How Songguo Mobility Built a Real‑Time OLAP Platform with StarRocks: From 1.0 to 3.0

Songguo Mobility’s data‑center team migrated from a fragmented Impala‑Kudu‑ClickHouse stack to a unified StarRocks‑based real‑time OLAP architecture, iterating through three versions to solve scalability, latency, and maintenance challenges while supporting minute‑level dashboards for orders and vehicle analytics.

FlinkKafkaReal-time OLAP

0 likes · 19 min read

How Songguo Mobility Built a Real‑Time OLAP Platform with StarRocks: From 1.0 to 3.0

DataFunSummit

Jul 17, 2022 · Big Data

Elasticsearch and Big Data: Architecture, Use Cases, and Advantages

This article explains what Elasticsearch is, how it solves database acceleration, log observability, and data analysis problems, details its core components and underlying engine features, compares its strengths and weaknesses, and presents classic application scenarios and a real‑world case study integrating Elasticsearch with Flink for large‑scale log analytics.

Big DataElasticsearchFlink

0 likes · 13 min read

Elasticsearch and Big Data: Architecture, Use Cases, and Advantages

DataFunTalk

Jul 17, 2022 · Big Data

Redesigning Apache SeaTunnel: Decoupling Source and Sink APIs for Multi‑Engine Support

The presentation details the motivations, goals, and architectural redesign of Apache SeaTunnel (Incubating) to decouple its Source and Sink APIs from underlying engines, introducing unified APIs, version‑agnostic connectors, and enhanced support for Spark and Flink in both batch and streaming scenarios.

Apache SeaTunnelBig DataData Integration

0 likes · 12 min read

Redesigning Apache SeaTunnel: Decoupling Source and Sink APIs for Multi‑Engine Support

Big Data Technology Architecture

Jul 15, 2022 · Big Data

Using and Designing the Apache SeaTunnel Examples Module

This article introduces Apache SeaTunnel's Examples module, compares SeaTunnel with DataX, explains its multi‑engine design, demonstrates Flink and Spark example implementations, and shares the speaker's experiences contributing to the open‑source community, providing practical guidance for big‑data integration projects.

Apache SeaTunnelData IntegrationFlink

0 likes · 10 min read

Using and Designing the Apache SeaTunnel Examples Module

DataFunTalk

Jul 14, 2022 · Big Data

Real‑Time Data Lake Practices at ByteDance and Alibaba: Architecture, Challenges, and Solutions

This article presents detailed case studies of ByteDance and Alibaba implementing real‑time data lake solutions with Hudi and Flink, describing the business drivers, architectural challenges, and the specific technical strategies such as unified metadata layers, optimistic locking, scalable hash indexing, and CDC‑based incremental ETL to achieve low‑latency, high‑throughput data processing.

FlinkHudiReal-time Data Lake

0 likes · 9 min read

Real‑Time Data Lake Practices at ByteDance and Alibaba: Architecture, Challenges, and Solutions

DataFunSummit

Jul 12, 2022 · Big Data

Practical Use of Apache Iceberg in Microvision's Data Warehouse: Architecture, Real‑time Integration, and Table Maintenance

This article details why Microvision adopted Apache Iceberg, how it replaces parts of their Lambda‑architecture data pipeline, the real‑time and offline use cases, table‑maintenance practices such as snapshot cleanup and small‑file merging, and lessons learned from the implementation.

Big DataData LakeFlink

0 likes · 17 min read

Practical Use of Apache Iceberg in Microvision's Data Warehouse: Architecture, Real‑time Integration, and Table Maintenance

Big Data Technology & Architecture

Jul 7, 2022 · Big Data

Deep Dive into Apache Iceberg Core Features and Flink Integration

This article explains Apache Iceberg’s architecture, core capabilities such as time‑travel, fast scans, delete handling, and schema evolution, and provides a step‑by‑step guide for configuring Flink to use Iceberg with Hive and Hadoop catalogs, including DDL commands and streaming queries.

Apache IcebergBig DataData Lake

0 likes · 22 min read

Deep Dive into Apache Iceberg Core Features and Flink Integration

Hulu Beijing

Jul 7, 2022 · Big Data

How Hulu Upgraded Hadoop 2.6 to 3.0: Lessons in Compatibility and Migration

This article details Hulu's five‑year journey from Hadoop 2.6 to 3.3.2, covering major feature evolutions, the original cluster architecture, a comprehensive upgrade plan, compatibility challenges across HDFS, YARN, Hive, Spark and Flink, and the testing and rollout strategies that ensured a smooth migration.

Big DataCluster UpgradeCompatibility

0 likes · 17 min read

How Hulu Upgraded Hadoop 2.6 to 3.0: Lessons in Compatibility and Migration

Big Data Technology & Architecture

Jul 6, 2022 · Big Data

Understanding Apache Iceberg File Storage Format and Write Processes in Spark and Flink

This article explains the Apache Iceberg file storage format, its metadata hierarchy, and demonstrates how Spark and Flink write data to Iceberg tables, including detailed code examples, manifest handling, snapshot management, and commit processes for efficient data lake operations.

Apache IcebergBig DataData Lake

0 likes · 31 min read

Understanding Apache Iceberg File Storage Format and Write Processes in Spark and Flink

HelloTech

Jul 6, 2022 · Big Data

Investigation and Resolution of Elasticsearch Write Timeout Issues in a Real-Time Flink Data Sync Pipeline

The team diagnosed intermittent Elasticsearch write‑timeout failures in their real‑time Flink‑to‑Elasticsearch pipeline as lock contention from frequent duplicate updates to the same document IDs, and eliminated the issue by aggregating binlog events in a 5‑second sliding window to deduplicate writes, adjusting refresh intervals, using async translog durability, and disabling non‑essential fields.

Big DataElasticsearchFlink

0 likes · 7 min read

Investigation and Resolution of Elasticsearch Write Timeout Issues in a Real-Time Flink Data Sync Pipeline

High Availability Architecture

Jun 29, 2022 · Big Data

Interview with Shopee Data Engineer Deng Lin on Lakehouse Architecture and Big Data Trends

During a pre‑GIAC interview, Shopee data engineer Deng Lin discusses the evolution of data lakes and warehouses, lakehouse integration, big‑data technology choices, real‑time processing with Flink and Kafka, and offers career advice for newcomers to the big‑data field.

Big DataFlinkKafka

0 likes · 10 min read

Interview with Shopee Data Engineer Deng Lin on Lakehouse Architecture and Big Data Trends

Alibaba Cloud Developer

Jun 28, 2022 · Big Data

How Kuaishou Guarantees Real‑Time Data Warehouse Reliability During Billion‑Scale Events

This article details Kuaishou’s real‑time data warehouse architecture and its comprehensive assurance framework—including forward lifecycle standards, reverse fault‑injection testing, and Spring Festival event practices—highlighting challenges of massive traffic, high timeliness, accuracy, and stability, and outlining future plans for automation, batch‑stream integration, and cost reduction.

FlinkReal-time StreamingSLA

0 likes · 23 min read

How Kuaishou Guarantees Real‑Time Data Warehouse Reliability During Billion‑Scale Events

DataFunTalk

Jun 28, 2022 · Big Data

JD Retail Traffic Data Warehouse Architecture and Processing Practices

This article presents a comprehensive technical overview of JD.com’s retail traffic data processing pipeline, detailing the multi‑layer data warehouse architecture, real‑time and offline data flows, a large‑scale back‑fill case using Iceberg and OLAP, data‑skew detection and mitigation techniques, and future directions involving unified Flink‑Spark streaming‑batch solutions.

Data SkewFlinkIceberg

0 likes · 12 min read

JD Retail Traffic Data Warehouse Architecture and Processing Practices

Zuoyebang Tech Team

Jun 17, 2022 · Big Data

How FlinkSQL Auto‑Tuning Saves Resources and Guarantees SLA

This article describes the design and implementation of an automated FlinkSQL tuning system that monitors metrics, evaluates task health with rule‑based logic, calculates optimal resource adjustments, and performs fast scaling to reduce cluster waste, lower operational costs, and maintain SLA compliance.

AkkaAuto ScalingFlink

0 likes · 15 min read

How FlinkSQL Auto‑Tuning Saves Resources and Guarantees SLA

Alibaba Cloud Developer

Jun 14, 2022 · Big Data

Can a Streaming Data Warehouse Balance Freshness, Latency, and Cost?

This article examines the core trade‑offs of data warehouses—freshness, query latency, and cost—compares offline and real‑time architectures, introduces the concept of a streaming data warehouse, and details how Apache Flink Table Store aims to provide a unified, low‑cost solution.

Big DataFlinkReal-time analytics

0 likes · 19 min read

Can a Streaming Data Warehouse Balance Freshness, Latency, and Cost?

JD Retail Technology

Jun 10, 2022 · Big Data

Design and Implementation of an International Business Data Platform for JD.com's 618 Promotion

The article details JD International's challenges and solutions in building a unified, real‑time data platform for its multi‑regional 618 promotion, covering business characteristics, data distribution, team organization, dashboard architecture, integration strategies, and short‑ and long‑term technical plans.

Data IntegrationData PlatformFlink

0 likes · 8 min read

Design and Implementation of an International Business Data Platform for JD.com's 618 Promotion

Bilibili Tech

Jun 10, 2022 · Big Data

Incremental Data Lake Design and Hudi Core Optimizations with Flink

The article describes how combining Apache Flink with Hudi enables an incremental data lake that delivers near‑real‑time analytics by switching to merge‑on‑read, fixing log handling bugs, improving compaction planning, and refactoring table‑service scheduling, while showcasing use cases such as CDC ingestion, data quality control, and real‑time materialized views, and outlines future enhancements like optimistic concurrency and unified schema evolution.

Apache HudiCDCCompaction Optimization

0 likes · 21 min read

Incremental Data Lake Design and Hudi Core Optimizations with Flink

Zuoyebang Tech Team

Jun 7, 2022 · Big Data

How Doris Powered Zuoyebang’s Real‑Time Data Warehouse for Faster Insights

Zuoyebang’s data team replaced fragmented, slow query solutions with Apache Doris, building a unified real‑time data warehouse that dramatically cut query latency from hours to seconds, streamlined data modeling, and improved reliability across diverse business scenarios, while integrating with Flink, Kafka, and ES via a unified API.

Apache DorisElasticsearchFlink

0 likes · 20 min read

How Doris Powered Zuoyebang’s Real‑Time Data Warehouse for Faster Insights

DataFunTalk

Jun 6, 2022 · Big Data

Understanding Flink's Exactly-Once Guarantees: Checkpoint, Two‑Phase Commit, and Kafka Integration

This article explains how Apache Flink achieves end‑to‑end exactly‑once semantics by using source replay support, checkpoint‑based snapshots, asynchronous incremental checkpoints, and two‑phase commit sinks, and describes the interaction with external systems such as Kafka to ensure transactional writes.

Big DataCheckpointExactly-Once

0 likes · 7 min read

Understanding Flink's Exactly-Once Guarantees: Checkpoint, Two‑Phase Commit, and Kafka Integration

dbaplus Community

May 24, 2022 · Big Data

How Vipshop Replaced ELK with ClickHouse for a Scalable, Low‑Cost Log System

Vipshop’s Dragonfly log platform evolved from a costly 260‑node Elasticsearch cluster to a ClickHouse‑based architecture that uses a unified JSON format, vfilebeat ingestion, Flink parsing, and MergeTree storage to achieve high‑throughput writes, fast vectorized queries, flexible TTL management, and dramatically lower operational expenses.

EFKFlinkKafka

0 likes · 20 min read

How Vipshop Replaced ELK with ClickHouse for a Scalable, Low‑Cost Log System

DataFunTalk

May 24, 2022 · Big Data

Integrating Apache Flink with Apache Hudi: From Data Warehouse to Data Lake

This article explains how Apache Flink integrates with Apache Hudi to enable real‑time data lake ingestion, covering the evolution from traditional data warehouses to data lakes, Hudi’s core concepts such as timeline and file grouping, copy‑on‑write vs merge‑on‑read modes, and Flink’s CDC‑based ETL pipeline.

Big DataCDCData Lake

0 likes · 18 min read

Integrating Apache Flink with Apache Hudi: From Data Warehouse to Data Lake

DataFunSummit

May 21, 2022 · Big Data

Tencent News Massive Log Processing Architecture and Data Applications

The article presents Tencent News' comprehensive massive log processing solution, covering background, overall architecture, data collection, real-time and offline computation layers, data quality assurance, and practical examples such as Flink CDC for database synchronization, illustrating how large‑scale data is managed and applied.

FlinkLog ProcessingTencent

0 likes · 10 min read

Architect

May 17, 2022 · Big Data

Design and Architecture of an Integrated BI Platform Using Apache Kylin for Large‑Scale OLAP

The article explains the challenges of big‑data analytics, introduces pre‑computation OLAP concepts, and details how Apache Kylin together with Spark, Flink, Presto and other components can be integrated into a BI platform to achieve near‑real‑time query performance on massive datasets.

Apache KylinBIFlink

0 likes · 11 min read

Design and Architecture of an Integrated BI Platform Using Apache Kylin for Large‑Scale OLAP

Big Data Technology & Architecture

May 15, 2022 · Big Data

Understanding Flink Window Table-Valued Functions (TVF) and Incremental Optimization

This article explains the concept of window table-valued functions in Flink, compares the old grouped‑window syntax with the new TVF syntax, details the physical and execution plans, introduces sliced windows for state reduction, and presents a small incremental‑output improvement with code examples.

Big DataFlinkIncremental Aggregation

0 likes · 12 min read

Understanding Flink Window Table-Valued Functions (TVF) and Incremental Optimization

Zuoyebang Tech Team

May 9, 2022 · Big Data

How Flink SQL Powered Real‑Time Learning Analytics at Zuoyebang

Zuoyebang’s big‑data team shares how they evolved from SparkStreaming to a Flink‑SQL‑centric real‑time platform, detailing three development stages, challenges in DAG optimization, Redis‑based table design, and platform features for unified deployment, ease of use, and operational governance.

FlinkReal-TimeStreaming

0 likes · 14 min read

How Flink SQL Powered Real‑Time Learning Analytics at Zuoyebang

58 Tech

May 5, 2022 · Big Data

Low-Code Real-Time Data Warehouse Construction System Using Flink

This article describes a low‑code, Flink‑based real‑time data‑warehouse construction system that abstracts the warehouse building process into ODS, DWD, DWS, and ADS layers, leverages a domain‑specific language and plugin engine to reduce development effort, and details its architecture, DSL design, plugin extensibility, dimension‑table completion, stream merging, aggregation, and storage strategies.

Big DataDSLFlink

0 likes · 11 min read

Low-Code Real-Time Data Warehouse Construction System Using Flink

Big Data Technology & Architecture

May 4, 2022 · Big Data

Apache Hudi 0.11.0 Release Highlights: Multi‑Mode Index, Data Skipping, Async Index, Spark & Flink Integration, and New Utilities

The Apache Hudi 0.11.0 release introduces multi‑mode metadata indexing, enhanced data‑skipping, asynchronous indexing, extensive Spark and Flink integration improvements, new bundle utilities, and expanded metadata synchronization with BigQuery, AWS Glue, and DataHub, while also adding bucket indexing and encryption support.

Apache HudiAsync IndexBig Data

0 likes · 13 min read

Apache Hudi 0.11.0 Release Highlights: Multi‑Mode Index, Data Skipping, Async Index, Spark & Flink Integration, and New Utilities

Bilibili Tech

May 3, 2022 · Artificial Intelligence

Bilibili AI Collaboration Platform Based on AIFlow: Architecture, Evolution, and Stream‑Batch Fusion

Bilibili built an AI collaboration platform based on AIFlow to simplify real-time machine-learning workflows, evolving through three stages that added event-driven scheduling, UI-driven parameter management, version snapshots, and a stateless client-server design, while enabling stream-batch fusion for feature back-filling; future work targets high availability, Airflow 2.0 compatibility, and richer streaming ML operators.

AIFlowBilibiliFlink

0 likes · 17 min read

Bilibili AI Collaboration Platform Based on AIFlow: Architecture, Evolution, and Stream‑Batch Fusion

Big Data Technology & Architecture

Apr 27, 2022 · Big Data

Understanding Window Table-Valued Functions (TVF) in Flink and Their Optimizations

This article explains Flink's window table-valued functions (TVF), shows how they replace the old grouped‑window syntax with concrete SQL examples, describes the physical planning rules, introduces sliced windows for state efficiency, and presents a small incremental‑output improvement for cumulative windows.

Big DataFlinkStreaming

0 likes · 11 min read

Understanding Window Table-Valued Functions (TVF) in Flink and Their Optimizations

HomeTech

Apr 27, 2022 · Big Data

AutoStream Real‑Time Computing Platform: Architecture, Resource Management, Scaling, Lakehouse Integration, and PyFlink Practices

This article details Car Home's AutoStream platform evolution from Storm to Flink‑based versions, covering real‑time application scenarios, strict budget‑controlled resource management, automatic scaling, lake‑house architecture with Iceberg, PyFlink integration, and future plans for resource optimisation and batch‑stream unification.

AutoStreamFlinkLakehouse

0 likes · 15 min read

AutoStream Real‑Time Computing Platform: Architecture, Resource Management, Scaling, Lakehouse Integration, and PyFlink Practices

DataFunTalk

Apr 25, 2022 · Big Data

Comprehensive Guide to Flink Deployment, State Programming, Checkpointing, and Performance Tuning

This article provides an extensive overview of Apache Flink, covering deployment modes, cluster sizing, job submission workflows, state programming concepts, checkpoint mechanisms, backpressure handling, comparison with Spark, and practical code snippets for configuration and optimization.

Big DataCheckpointFlink

0 likes · 48 min read

Comprehensive Guide to Flink Deployment, State Programming, Checkpointing, and Performance Tuning

DataFunSummit

Apr 22, 2022 · Big Data

Huya Real-Time Computing SLA Practice: Platform Evolution, Core SLA Definition, Capability Building, and Future Outlook

The talk details Huya’s real‑time computing platform evolution from chaotic early stages to a unified, containerized system, defines core SLA metrics focused on latency compliance, describes capability enhancements such as demand monitoring, task analysis, dynamic scaling, and outlines future goals for usability, stability, openness, and unified stream‑batch processing.

FlinkReal‑Time ComputingSLA

0 likes · 12 min read

Huya Real-Time Computing SLA Practice: Platform Evolution, Core SLA Definition, Capability Building, and Future Outlook

ITPUB

Apr 19, 2022 · Big Data

Which Real-Time Data Warehouse Architecture Fits Your Needs? A Deep Dive

This article explains why modern enterprises need real‑time data‑warehouse architectures, breaks down traditional layered warehouse concepts, compares Lambda and Kappa models, evaluates five practical real‑time solutions—including Iceberg‑based lakehouse and MPP databases—provides code snippets, and offers selection guidance with real‑world company examples.

Big DataFlinkIceberg

0 likes · 19 min read

Which Real-Time Data Warehouse Architecture Fits Your Needs? A Deep Dive

Big Data Technology & Architecture

Apr 19, 2022 · Big Data

Understanding Flink Checkpoint and Unaligned Checkpoint Mechanisms

This article explains Flink's fundamental checkpoint mechanism, its coupling with backpressure, and how the introduction of Unaligned Checkpoint in Flink 1.11 decouples checkpointing from backpressure to improve latency and resource utilization in high‑backpressure streaming jobs.

Big DataCheckpointFlink

0 likes · 14 min read

Understanding Flink Checkpoint and Unaligned Checkpoint Mechanisms

Big Data Technology & Architecture

Apr 15, 2022 · Big Data

Configuring Flink SQL Client with Iceberg: Catalogs, DDL, Data Insertion and Query

This guide explains how to set up the Flink SQL client to work with Apache Iceberg, covering Scala version requirements, downloading and deploying Iceberg jars, configuring Hive and HDFS catalogs, creating databases and tables, performing insert and overwrite operations, and querying data in both batch and streaming modes.

Big DataCatalogFlink

0 likes · 18 min read

Configuring Flink SQL Client with Iceberg: Catalogs, DDL, Data Insertion and Query

DataFunTalk

Apr 15, 2022 · Big Data

Huya Real-Time Computing SLA Practices: Platform Evolution, Core SLA Definition, Capability Building, and Future Outlook

This article details Huya's real‑time computing platform evolution, core SLA definitions focused on latency compliance, capability enhancements such as demand management, task analysis, dynamic resource scaling, and outlines future directions emphasizing usability, stability, openness, and unified batch‑stream processing.

FlinkReal‑Time ComputingSLA

0 likes · 13 min read

Huya Real-Time Computing SLA Practices: Platform Evolution, Core SLA Definition, Capability Building, and Future Outlook

Big Data Technology & Architecture

Apr 14, 2022 · Big Data

Practical Guide to Monitoring Flink Performance, Detecting Backpressure, and Configuring Alerts

This article explains how to use Flink's Web UI, Kafka metrics, and YARN monitoring to observe performance, diagnose backpressure, and set alert thresholds, providing code examples and practical tips for reliable stream processing in production environments.

Big DataFlinkKafka

0 likes · 9 min read

Practical Guide to Monitoring Flink Performance, Detecting Backpressure, and Configuring Alerts

Shopee Tech Team

Apr 14, 2022 · Big Data

URL Normalization and Statistical Analysis in MDAP Using Probabilistic and Machine Learning Techniques

MDAP normalizes URLs by automatically learning pattern‑tree rule models using entropy‑based splits, gibberish and numeric detection, and scalable Flink processing, which groups millions of raw URLs into concise patterns for accurate statistical monitoring, dramatically reducing data noise while still facing latency and model‑iteration challenges.

Flinkmachine learningpattern tree

0 likes · 20 min read

URL Normalization and Statistical Analysis in MDAP Using Probabilistic and Machine Learning Techniques

dbaplus Community

Apr 13, 2022 · Big Data

How Meituan Built a Scalable Real‑Time Data Warehouse with Flink

This article explains Meituan's real‑time data warehouse architecture, covering typical business scenarios, the evolution of its streaming platform, key design challenges, solutions such as unified data models, SQL‑based development, UDF hosting, operator optimizations, and future plans for incremental processing and unified batch‑stream semantics.

FlinkMeituanreal-time data

0 likes · 18 min read

How Meituan Built a Scalable Real‑Time Data Warehouse with Flink

Big Data Technology & Architecture

Apr 11, 2022 · Big Data

Real-Time Data Warehouse Construction: Background, Objectives, Architecture, and Case Studies

This article explains the growing demand for real‑time data warehouses, outlines their objectives and layered architecture, and presents detailed case studies from Didi, Kuaishou, Tencent, Youzan and others, illustrating design choices, implementation challenges, and best practices for building scalable streaming data platforms.

FlinkKafkabig-data

0 likes · 48 min read

Real-Time Data Warehouse Construction: Background, Objectives, Architecture, and Case Studies

DataFunSummit

Apr 6, 2022 · Big Data

Real-time Dimension Modeling with Flink SQL: Challenges and Solutions

This article presents a JD.com case study on applying Flink SQL for real‑time dimension modeling, detailing two complex streaming scenarios—full‑join of multiple streams and full‑group aggregation—along with the associated challenges of historical data handling, state management, and performance optimization, and proposes component‑based architectural solutions.

Big DataFlinkReal-Time

0 likes · 14 min read

Real-time Dimension Modeling with Flink SQL: Challenges and Solutions

Big Data Technology & Architecture

Apr 5, 2022 · Big Data

Using ElasticsearchSink with Apache Flink: Configuration, Retry Strategies, and Failure Handling

This article introduces the ElasticsearchSink for Apache Flink, explains how to add Maven dependencies, implement the sink with configuration and retry settings, details failure handlers, and highlights important considerations such as exception handling and checkpoint requirements for reliable streaming pipelines.

Big DataElasticsearchFailure Handling

0 likes · 9 min read

Using ElasticsearchSink with Apache Flink: Configuration, Retry Strategies, and Failure Handling

NetEase Yanxuan Technology Product Team

Mar 30, 2022 · Big Data

Data Lake Construction and Practice at NetEase Yanxuan

NetEase Yanxuan replaced its cumbersome data‑warehouse with a flexible Delta‑Lake/Iceberg data lake, creating a unified metadata layer and real‑time ingestion pipelines that cut latency from nightly batches to seconds, slashed compute and storage costs, supported diverse business scenarios and machine‑learning feature engineering, and set the stage for broader future expansion.

Data IntegrationData LakeDelta Lake

0 likes · 16 min read

Data Lake Construction and Practice at NetEase Yanxuan

Efficient Ops

Mar 29, 2022 · Big Data

How Tencent Cloud Boosted APM Metric Computation Speed 2‑3× with Flink Optimizations

This article explains how Tencent Cloud's APM metric calculation, which transforms massive Span data into aggregated metrics using Flink, faced performance bottlenecks and was optimized through job splitting, batch merging, and dimension pruning, ultimately achieving a 2‑3× speed increase and cutting resource usage to about 30% of the original.

APMBig DataFlink

0 likes · 10 min read

How Tencent Cloud Boosted APM Metric Computation Speed 2‑3× with Flink Optimizations

58 Tech

Mar 29, 2022 · Big Data

Design and Implementation of the 58 Group Penalty Data Center

This article presents the design, architecture, and implementation of a unified penalty data center for 58 Group, detailing the challenges of heterogeneous data sources, the selection of Flink for real‑time ETL, the use of a DSL and LRU aggregation, and the adoption of MVEL for feature recognition to achieve standardized, high‑performance penalty data processing.

Big DataETLFlink

0 likes · 13 min read

Design and Implementation of the 58 Group Penalty Data Center

Big Data Technology & Architecture

Mar 28, 2022 · Big Data

Real-time Dimension Modeling with Flink SQL: Problems, Challenges, and Solutions

This article presents JD's real-time dimension modeling case using Flink SQL, detailing two complex streaming scenarios, the difficulties of handling historical data and state management, and a component‑based solution that leverages external KV stores and optimized Flink operators to improve performance and scalability.

Big DataFlinkReal-Time

0 likes · 13 min read

Real-time Dimension Modeling with Flink SQL: Problems, Challenges, and Solutions

StarRocks

Mar 28, 2022 · Backend Development

Scaling Microservice Tracing with Zipkin and StarRocks: A Practical Guide

This article explains how Sohu Smart Media built a high‑performance tracing system for microservices by integrating Zipkin for data collection with StarRocks for storage and analytics, covering architecture, data models, SQL queries, Flink processing, and real‑world results that boost observability and engineering efficiency.

FlinkMicroservicesStarRocks

0 likes · 31 min read

Scaling Microservice Tracing with Zipkin and StarRocks: A Practical Guide

Alibaba Cloud Developer

Mar 25, 2022 · Big Data

How Douyu Built a Scalable Real‑Time Flink Platform on Kubernetes

Douyu’s journey from early Spark and Storm streaming to a Kubernetes‑native Flink platform illustrates the architectural design, challenges, and solutions for large‑scale real‑time computing, data warehousing, and future scalability in a high‑traffic live‑streaming environment.

FlinkIcebergKubernetes

0 likes · 12 min read

How Douyu Built a Scalable Real‑Time Flink Platform on Kubernetes

Alibaba Cloud Developer

Mar 24, 2022 · Big Data

How Flink Powers Real‑Time Process Operations in China Construction Bank

This article details how China Construction Bank's fintech subsidiary leveraged Apache Flink to ingest, join, and analyze massive front‑end, request, and response logs in real time, overcoming data silos, latency challenges, and state‑management issues to enable end‑to‑end process visibility and operational optimization.

BankingFlinkprocess mining

0 likes · 17 min read

How Flink Powers Real‑Time Process Operations in China Construction Bank

DataFunTalk

Mar 24, 2022 · Big Data

Real‑time Dimension Modeling with Flink SQL: Problems, Challenges, and Solutions

This article presents a JD.com BI engineer's case study on applying Flink SQL to real‑time dimension modeling, detailing two complex streaming scenarios, the technical difficulties of handling historical data and performance, and a component‑based solution architecture with future roadmap considerations.

Big DataFlinkReal-Time

0 likes · 13 min read

StarRocks

Mar 23, 2022 · Databases

Accelerating Zepp Health’s Analytics with StarRocks: An OLAP Case Study

Facing inflexible point‑lookup limits and slow query times on HBase, Zepp Health redesigned its massive event‑tracking data pipeline—migrating ingestion through Kafka, Flink, and Hudi to a StarRocks‑based OLAP layer—achieving sub‑100 ms average query latency, 20 % storage savings, and dramatically faster multi‑dimensional analytics.

Big DataFlinkHudi

0 likes · 9 min read

Accelerating Zepp Health’s Analytics with StarRocks: An OLAP Case Study

DataFunTalk

Mar 23, 2022 · Big Data

Iceberg Data Lake Query Optimization Practices and Governance

This talk by Tencent senior engineer Chen Liang covers Iceberg table format fundamentals, data lake ingestion, query processing, hidden partitioning, time‑travel, major features, optimization techniques such as compaction, bin‑packing, sorting and Z‑ordering, and outlines a future roadmap for improving performance and governance in big‑data environments.

Big DataData LakeFlink

0 likes · 12 min read

Iceberg Data Lake Query Optimization Practices and Governance

DeWu Technology

Mar 21, 2022 · Big Data

Real-time Customer Service Dashboard: Architecture and Implementation with Flink and ClickHouse

The article describes a real‑time customer‑service dashboard built on Flink for streaming MySQL changes captured via Kafka, which cleans and aggregates ~60 operational metrics before writing them to ClickHouse’s MergeTree/ReplacingMergeTree tables, enabling sub‑second queries and exactly‑once guarantees while separating offline and live pipelines.

DashboardFlinkclickhouse

0 likes · 18 min read

Real-time Customer Service Dashboard: Architecture and Implementation with Flink and ClickHouse

Alibaba Cloud Developer

Mar 17, 2022 · Big Data

How AutoStream Scales Real‑Time Data Processing with Flink, Iceberg, and PyFlink

This article details AutoStream's evolution from a Java‑only Storm platform to a Flink‑based, Kubernetes‑native streaming system that integrates budgeting controls, automatic scaling, lakehouse architecture with Iceberg, and PyFlink support, highlighting the technical challenges, solutions, and future roadmap for real‑time analytics.

FlinkIcebergLakehouse

0 likes · 23 min read

How AutoStream Scales Real‑Time Data Processing with Flink, Iceberg, and PyFlink

Big Data Technology & Architecture

Mar 16, 2022 · Big Data

End‑to‑End Streaming Data Pipeline with Kafka, Flink, and Apache Griffin

This tutorial demonstrates how to build a complete streaming data pipeline by configuring JDK, MySQL, Hadoop, Hive, Spark, Kafka, and Griffin, generating test data with shell scripts, processing it with Flink, and validating data quality using Apache Griffin in a Spark‑based deployment.

Apache GriffinBig DataData Quality

0 likes · 13 min read

End‑to‑End Streaming Data Pipeline with Kafka, Flink, and Apache Griffin

Big Data Technology & Architecture

Mar 15, 2022 · Big Data

Using Flink CDC to Capture MySQL Changes and Sync Them to ClickHouse

This article introduces Change Data Capture (CDC), compares query‑based and log‑based CDC, explains Debezium and ClickHouse, and provides step‑by‑step Flink CDC and Flink SQL CDC examples—including full Java code—to stream MySQL binlog changes into ClickHouse for real‑time analytics.

Big DataCDCData Streaming

0 likes · 17 min read

Using Flink CDC to Capture MySQL Changes and Sync Them to ClickHouse

Yiche Technology

Mar 9, 2022 · Cloud Native

Design and Implementation of the Yunji Logging System Using Flink and ClickHouse

The article presents the Yunji logging system, a Flink+ClickHouse-based cloud-native platform for real-time ingestion, storage, querying, analysis, and monitoring of massive heterogeneous logs, covering its architecture, configuration center, storage design, processing flow, monitoring features, and future enhancements.

Cloud NativeFlinkJanino

0 likes · 21 min read

Design and Implementation of the Yunji Logging System Using Flink and ClickHouse

Alibaba Cloud Developer

Mar 7, 2022 · Big Data

How China Mobile’s Real‑Time Computing Platform Scales Billions of Events with Flink

This article details China Mobile (Suzhou) Software Technology's evolution from Storm to Flink for real‑time computing, its multi‑version engine and log‑retrieval designs, signal‑business data pipeline optimizations, stability practices around ZooKeeper, and future directions in resource scaling and data‑lake integration.

FlinkKafkaReal-Time

0 likes · 12 min read

How China Mobile’s Real‑Time Computing Platform Scales Billions of Events with Flink

dbaplus Community

Mar 2, 2022 · Big Data

How Real‑Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices

This article explores the growing demand for real‑time data warehouses, compares them with traditional offline warehouses, and presents detailed architectures, layer designs, naming conventions, and case studies from companies like Didi, Kuaishou, Tencent, and Youzan, highlighting challenges, solutions, and performance optimizations.

Big Data ArchitectureFlinkIceberg

0 likes · 47 min read

How Real‑Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices

DataFunTalk

Mar 1, 2022 · Cloud Native

Alibaba Cloud Native Data Lake with Apache Iceberg: Architecture, Challenges, and Solutions

The presentation outlines Alibaba Cloud's native data lake solution built on Apache Iceberg, covering data lake fundamentals, cloud migration challenges, Iceberg's architecture and features, real‑time ingestion with Flink, unified metadata management, security guarantees, and testing practices to ensure reliable, scalable big‑data analytics.

Apache IcebergBig DataData Lake

0 likes · 16 min read

Alibaba Cloud Native Data Lake with Apache Iceberg: Architecture, Challenges, and Solutions

DataFunTalk

Feb 25, 2022 · Big Data

Tencent's Application of Apache Iceberg for Real‑Time Data Lake Ingestion, Governance, and Query Optimization

This article explains how Tencent leverages Apache Iceberg together with Flink to build a real‑time data lake pipeline, covering data ingestion, Iceberg's snapshot‑based read/write model, compaction and governance services, Z‑order based query optimization, performance results, and future roadmap.

Apache IcebergBig DataData Lake

0 likes · 24 min read

Tencent's Application of Apache Iceberg for Real‑Time Data Lake Ingestion, Governance, and Query Optimization

Big Data Technology & Architecture

Feb 24, 2022 · Big Data

Understanding Async I/O in Apache Flink: Usage, Implementation, and Fault Tolerance

This article explains how to use Async I/O in Flink, describes the ordered and unordered output modes, details the internal AsyncWaitOperator implementation with its producer‑consumer model, and discusses fault‑tolerance mechanisms including state snapshot and recovery.

FaultToleranceFlinkStreamProcessing

0 likes · 17 min read

Understanding Async I/O in Apache Flink: Usage, Implementation, and Fault Tolerance

dbaplus Community

Feb 23, 2022 · Big Data

Inside OPPO’s Real‑Time Computing Platform: Architecture, Practices, and Future Roadmap

This article details OPPO’s real‑time computing platform, covering its business scope, big‑data architecture built on Flink, Spark and Trino, the end‑to‑end job development lifecycle, SQL IDE features, diagnostic and monitoring mechanisms, link latency tracking, SLA guarantees, practical use cases, and upcoming lakehouse and cloud‑native evolution.

FlinkReal‑Time Computingbig data platform

0 likes · 23 min read

Inside OPPO’s Real‑Time Computing Platform: Architecture, Practices, and Future Roadmap

vivo Internet Technology

Feb 23, 2022 · Big Data

Kafka-based Real-Time Data Warehouse: Architecture and Practice for Search

The article explains how Kafka serves as the core of a real‑time data warehouse for search, detailing its advantages over traditional databases, integration with Flink for low‑latency stream processing, architectural patterns such as Lambda/Kappa, scaling challenges, and comprehensive monitoring using Kafka Eagle.

Apache KafkaData IntegrationFlink

0 likes · 15 min read

Kafka-based Real-Time Data Warehouse: Architecture and Practice for Search

Big Data Technology & Architecture

Feb 23, 2022 · Big Data

Understanding Mini‑Batch Streaming Aggregation in Flink SQL

This article explains Flink SQL’s streaming aggregation Mini‑Batch feature, covering its purpose, configuration parameters, underlying optimizer rules, operator implementations, watermark handling, buffer processing, and the optional Local‑Global two‑phase aggregation optimization for improving throughput and reducing state overhead in large‑scale data pipelines.

Big DataFlinkMini-Batch

0 likes · 10 min read

Understanding Mini‑Batch Streaming Aggregation in Flink SQL

Volcano Engine Developer Services

Feb 16, 2022 · Big Data

ByteDance’s Journey to a Unified Data Lake with Flink and Hudi

This article recounts ByteDance’s evolution from batch‑only Flink pipelines to a unified data‑lake integration platform, detailing the three integration modes, challenges with Spark‑based CDC, the decision to adopt Hudi over Iceberg, and how Hudi’s indexing and Merge‑On‑Read formats enable near‑real‑time analytics at massive scale.

CDCFlinkHudi

0 likes · 10 min read

ByteDance’s Journey to a Unified Data Lake with Flink and Hudi

Big Data Technology & Architecture

Feb 16, 2022 · Big Data

Using Flink CDC to Capture MySQL Changes and Sync Them to ClickHouse

This article introduces Change Data Capture (CDC), compares query‑based and log‑based approaches, explains Debezium and ClickHouse, and provides detailed Flink CDC and Flink SQL CDC examples—including Java source code, custom deserialization schema, ClickHouse sink implementation, and required Maven dependencies—to synchronize MySQL data into ClickHouse in real time.

Big DataCDCData Streaming

0 likes · 17 min read

Big Data Technology & Architecture

Feb 15, 2022 · Big Data

Understanding Flink TaskManager Memory Model (Post‑1.10)

This article explains the official Flink memory model diagram, shows real‑world TaskManager memory parameters, and breaks down the five major memory components—including process, Flink, JVM heap, off‑heap, Metaspace, and overhead—providing configuration guidance for optimal resource allocation.

Big DataFlinkMemory

0 likes · 8 min read

Understanding Flink TaskManager Memory Model (Post‑1.10)

Alibaba Cloud Developer

Feb 14, 2022 · Backend Development

How Kuaishou Boosted Flink SQL Performance with Window Extensions and State Optimizations

Kuaishou dramatically increased Flink SQL adoption, introduced Group Window Aggregate and Window TVF extensions, applied aggregation state reuse and mini‑batch techniques, and enhanced stability through data‑skew mitigation and aggregate‑state compatibility, outlining future plans for streaming and batch SQL improvements.

FlinkState Optimizationsql

0 likes · 19 min read

How Kuaishou Boosted Flink SQL Performance with Window Extensions and State Optimizations

Big Data Technology & Architecture

Feb 14, 2022 · Big Data

Real-Time Advertising Data Warehouse Architecture Based on Flink

This article presents a comprehensive design of a real-time advertising data warehouse powered by Flink, covering construction background, technical and data‑warehouse architecture, real‑time OLAP, stability and data‑quality guarantees, future plans, and the integration of Hologres for simplified processing.

Big DataData QualityFlink

0 likes · 10 min read

Real-Time Advertising Data Warehouse Architecture Based on Flink

DataFunTalk

Feb 12, 2022 · Big Data

NetEase Internal Data Lake Project Arctic: Architecture, Requirements, and Future Roadmap

This article introduces NetEase's internally incubated data lake project Arctic, explains the concept of data lakes, outlines NetEase's specific requirements for a unified streaming‑batch platform, details Arctic's core architecture, storage strategy, data‑merge mechanisms, current achievements, and future development plans.

Apache IcebergArcticBig Data

0 likes · 10 min read

NetEase Internal Data Lake Project Arctic: Architecture, Requirements, and Future Roadmap

DataFunTalk

Feb 3, 2022 · Big Data

Improving Data Processing Efficiency at Kuaishou with Apache Hudi

This article explains how Kuashou tackled latency and efficiency problems in large‑scale data pipelines by adopting Apache Hudi, detailing the pain points, reasons for choosing Hudi, its architecture, model design, handling of bursty updates, back‑fill scenarios, and operational safeguards.

Big DataData LakeFlink

0 likes · 13 min read

Improving Data Processing Efficiency at Kuaishou with Apache Hudi

DataFunSummit

Jan 30, 2022 · Big Data

Real‑time Data Warehouse at Meituan: Architecture, Challenges, and Solutions

This article presents Meituan's real‑time data warehouse platform, describing typical streaming use cases, the evolution of its architecture from Storm and Spark Streaming to Flink, the challenges of development, operations and data quality, and the engineering solutions—including unified SQL, web IDE, UDF hosting, pipeline testing, and operator performance optimizations—implemented to support large‑scale, low‑latency analytics.

Flinkplatform architecturereal-time data

0 likes · 17 min read

Real‑time Data Warehouse at Meituan: Architecture, Challenges, and Solutions

Baidu Geek Talk

Jan 26, 2022 · Big Data

How a Real‑Time CDP Solves Data Silos: Architecture, Tech Choices & Lessons

This article examines the design and implementation of a tenant‑level real‑time Customer Data Platform, detailing CDP fundamentals, business and technical challenges, key architectural components, technology selections such as graph databases, stream processing, storage engines, and the operational practices that enable high‑throughput, low‑latency data integration and analytics.

CDPData IntegrationFlink

0 likes · 42 min read

How a Real‑Time CDP Solves Data Silos: Architecture, Tech Choices & Lessons

HomeTech

Jan 26, 2022 · Operations

Design and Practice of Autohome's Performance Testing Platform PTS

The article details the architecture, key components, testing types, and operational results of Autohome's PTS platform, which uses Docker Swarm, gRPC, JMeter, Flume‑Kafka, and Flink to conduct large‑scale distributed load testing for the 818 event and outlines future improvements toward Kubernetes and direct Kafka logging.

Docker SwarmFlinkJMeter

0 likes · 8 min read

Design and Practice of Autohome's Performance Testing Platform PTS

Architecture Digest

Jan 21, 2022 · Big Data

Building a Real-Time Data Warehouse with Flink: Architecture, Core Concepts, and Practical Implementation

This article explains how to build a unified stream‑batch real‑time data warehouse using FlinkSQL, covering prerequisite knowledge, five core concepts, two implementation approaches, a comparison of traditional versus real‑time architectures, and a comprehensive hands‑on example, illustrated with diagrams.

Batch ProcessingData ArchitectureFlink

0 likes · 6 min read

Building a Real-Time Data Warehouse with Flink: Architecture, Core Concepts, and Practical Implementation

Big Data Technology & Architecture

Jan 19, 2022 · Big Data

Understanding Flink End-to-End Latency Measurement with LatencyMarker

This article explains the background, source‑code analysis, implementation details, metric granularity, and practical considerations of Flink's LatencyMarker feature for measuring full‑link job latency in streaming applications.

Big DataFlinkLatencyMarker

0 likes · 12 min read

Understanding Flink End-to-End Latency Measurement with LatencyMarker

DataFunTalk

Jan 13, 2022 · Big Data

Advanced Features of the Pravega Flink Connector Table API: Schema Registry, Catalog Integration, and Debezium Support

This article summarizes the Pravega Schema Registry project, its integration with Flink's Catalog API, the addition of Debezium CDC support, and the related implementation challenges, providing detailed DDL examples, code snippets, and architectural diagrams for building real‑time data pipelines.

CDCCatalog APIDebezium

0 likes · 15 min read

Advanced Features of the Pravega Flink Connector Table API: Schema Registry, Catalog Integration, and Debezium Support

StarRocks

Jan 12, 2022 · Big Data

How Flink + StarRocks Deliver Lightning‑Fast Real‑Time Data Warehousing

This article explains the evolution, challenges, and technical solutions for building an end‑to‑end real‑time data warehouse by combining Apache Flink's stream processing with StarRocks' ultra‑fast OLAP engine, covering architecture, data models, integration methods, best‑practice cases, and future roadmap.

Big DataFlinkOLAP

0 likes · 21 min read

How Flink + StarRocks Deliver Lightning‑Fast Real‑Time Data Warehousing