Tagged articles

946 articles

Page 7 of 10

May 20, 2021 · Big Data

Flink 1.13 Release Highlights: Passive Scaling and Performance Analysis Features

Flink 1.13 introduces passive scaling that lets users adjust parallelism to resize jobs, adds visual tools such as load/back‑pressure charts, CPU flame graphs, and state‑backend metrics for deeper performance insight, and includes numerous community optimizations for easier upgrades and operation.

FlinkState Backendpassive scaling

0 likes · 5 min read

Flink 1.13 Release Highlights: Passive Scaling and Performance Analysis Features

Architect

May 19, 2021 · Big Data

Flink-Based Real-Time Recommendation System Architecture and Deployment Guide

This article presents a comprehensive overview of a Flink-powered real-time recommendation system, detailing its v2.0 architecture, module functions, recommendation algorithms, front‑end and back‑end interfaces, Docker‑based deployment of MySQL, Redis, HBase, Kafka, and step‑by‑step startup procedures.

DockerFlinkHBase

0 likes · 9 min read

Flink-Based Real-Time Recommendation System Architecture and Deployment Guide

DataFunTalk

May 18, 2021 · Big Data

Evolution and Architecture of Beike Real-Time Computing Platform

Beike's real-time computing platform, led by Liu Liyun, has evolved from early Spark Streaming to a Flink-based system with SQL 1.0, 2.0, and upcoming 3.0, supporting a large-scale data warehouse, event-driven processing, extensive monitoring, and diverse business scenarios across the company's operations.

Event-drivenFlinkReal-time Streaming

0 likes · 14 min read

Evolution and Architecture of Beike Real-Time Computing Platform

DataFunTalk

May 14, 2021 · Big Data

Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili

This article presents a technical deep‑dive into Bilibili’s evolution from offline to real‑time data processing, describing the challenges of timeliness, ETL, AI feature engineering, and the design of a Flink‑on‑YARN incremental pipeline that supports trillion‑scale message throughput and AI‑driven real‑time applications.

AIBig DataFlink

0 likes · 27 min read

Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili

Big Data Technology Architecture

May 13, 2021 · Big Data

Real-Time OLAP Evolution and Production Optimization at BTC.com

This article details BTC.com’s journey from a legacy batch‑oriented analytics stack to a modern real‑time OLAP architecture using Flink, ClickHouse, Kafka, and Kubernetes, highlighting the business drivers, technical choices, architectural evolution, optimizations, and future directions.

BlockchainFlinkReal-time OLAP

0 likes · 9 min read

Real-Time OLAP Evolution and Production Optimization at BTC.com

Big Data Technology Architecture

May 12, 2021 · Big Data

End-to-End Tutorial: Sync MySQL Binlog to Kafka and Consume with Flink Using TiDB

This article provides a step‑by‑step guide to build a data pipeline that captures MySQL binlog, streams it through Canal into Kafka, processes it with Flink, and finally writes the results into TiDB, covering environment setup, component deployment, configuration, and verification.

CanalFlinkTiDB

0 likes · 31 min read

End-to-End Tutorial: Sync MySQL Binlog to Kafka and Consume with Flink Using TiDB

DataFunTalk

May 11, 2021 · Big Data

Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake

This article details Baixin Bank's construction of a Flink‑driven real‑time computing platform integrated with Hudi as a real‑time data lake, covering background, architecture, data collection, transformation, storage layers, technical challenges, future roadmap, and practical lessons for similar big‑data initiatives.

Big DataFlinkHudi

0 likes · 12 min read

Design and Practice of Baixin Bank's Flink‑Based Real‑Time Computing Platform and Hudi‑Powered Real‑Time Data Lake

Big Data Technology & Architecture

May 10, 2021 · Big Data

Understanding Flink TaskManager Memory Allocation on YARN (Per‑Job Mode)

This article explains how Flink on YARN allocates TaskManager memory, breaks down the JVM heap, network buffers, and Flink Managed Memory, and shows how to calculate each component using configuration parameters and source‑code analysis.

FlinkMemory ManagementTaskManager

0 likes · 11 min read

Understanding Flink TaskManager Memory Allocation on YARN (Per‑Job Mode)

DataFunTalk

May 4, 2021 · Big Data

Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome

This article presents the background, requirements, architectural design, component interaction, and implementation details of AutoHome's real‑time data transmission platform built on Apache Flink, highlighting its high availability, exactly‑once semantics, scalability, DDL handling, and integration with existing streaming services.

Apache FlinkBig DataData Streaming

0 likes · 18 min read

Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome

DataFunTalk

May 2, 2021 · Big Data

Continuous Optimization and Practice of Flink at Kuaishou

This article presents Kuaishou's comprehensive engineering practices for improving Flink's stability, task startup latency, and SQL performance, including high‑availability Kafka connectors, fault‑recovery mechanisms, I/O reductions, asynchronous job upgrades, aggregation optimizations, and future resource‑utilization plans.

Big DataFlinkKafka

0 likes · 10 min read

Continuous Optimization and Practice of Flink at Kuaishou

DataFunTalk

Apr 27, 2021 · Big Data

Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System

This article describes how Linkflow migrated mutable customer data from MySQL to an Apache Hudi data lake using Debezium‑in‑Flink CDC, addressing challenges such as snapshot resumability, partial updates, row‑key merging, schema evolution, indexing, and concurrent writes to achieve minute‑level data freshness and improved offline processing performance.

Apache HudiBig DataCDC

0 likes · 21 min read

Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System

Big Data Technology & Architecture

Apr 24, 2021 · Big Data

Integrating Apache Flink 1.12.2 with Apache Hudi: Batch and Streaming Modes

This article walks through downloading the required Flink and Hudi components, building Hudi for Scala 2.12, and demonstrates step‑by‑step how to create, populate, query, and update Hudi tables in both batch and streaming modes using Flink SQL, complete with code snippets and result screenshots.

ApacheBatchData Lake

0 likes · 8 min read

Integrating Apache Flink 1.12.2 with Apache Hudi: Batch and Streaming Modes

DataFunTalk

Apr 23, 2021 · Big Data

Building and Evolving Zhihu’s Flink‑Based Data Integration Platform

This article details Zhihu’s transition from a Sqoop‑driven data integration system to a Flink‑centric platform, covering business scenarios, historical architecture, design goals, technology choices, performance optimizations, and future plans for unified streaming‑batch processing across diverse storage systems.

Batch ProcessingBig DataData Integration

0 likes · 14 min read

Building and Evolving Zhihu’s Flink‑Based Data Integration Platform

Big Data Technology & Architecture

Apr 23, 2021 · Big Data

Reading HBase with Flink 1.12 – Environment Setup, Code Samples, and Result

This article demonstrates how to configure Flink 1.12 to read data from HBase, covering the required environment components, HBase table creation, Maven dependencies, Java POJO and Flink‑SQL code, and showing the query results with and without printing the TableResult.

FlinkHBaseStreaming

0 likes · 11 min read

Reading HBase with Flink 1.12 – Environment Setup, Code Samples, and Result

dbaplus Community

Apr 20, 2021 · Big Data

10 Common Pitfalls When Migrating Spark Jobs to Flink (And How to Avoid Them)

This article shares ten practical pitfalls encountered when moving hourly Spark session jobs to Flink, covering parallelism load imbalance, state TTL, checkpointing strategies, logging, JMX debugging, state migration risks, reduce vs process choices, input data validation, event‑time handling, and external storage considerations, along with concrete configuration snippets and performance tips.

FlinkSpark migrationState Management

0 likes · 20 min read

10 Common Pitfalls When Migrating Spark Jobs to Flink (And How to Avoid Them)

IT Architects Alliance

Apr 20, 2021 · Big Data

Real-time Log Processing System Based on Flink and Drools

This article describes a real-time log processing platform that integrates Kafka, Flink, Drools rule engine, Redis, and Elasticsearch to unify heterogeneous log formats, extract business metrics, and provide configurable, dynamic data processing for large‑scale logging scenarios.

DroolsElasticsearchFlink

0 likes · 6 min read

Real-time Log Processing System Based on Flink and Drools

dbaplus Community

Apr 17, 2021 · Big Data

How a Traditional Finance Firm Tackles Real‑Time Analytics with Flink

This article details a financial company's exploration of Apache Flink for real‑time processing, covering its unique business constraints, end‑to‑end data pipeline, single‑table and multi‑table use cases, implementation challenges, code snippets, data initialization, testing strategies, and lessons learned.

FinancialFlinkHBase

0 likes · 13 min read

How a Traditional Finance Firm Tackles Real‑Time Analytics with Flink

Huolala Tech

Apr 16, 2021 · Cloud Native

How to Build a Scalable Kubernetes Logging Pipeline with EFK and Fluentd

This article explains how to collect, process, and visualize Flink job logs on Kubernetes using an EFK stack with Fluentd, covering logging architectures, deployment of Elasticsearch, Kibana, and Fluentd, and the backend logic for querying and displaying logs in a feature platform.

EFKElasticsearchFlink

0 likes · 20 min read

How to Build a Scalable Kubernetes Logging Pipeline with EFK and Fluentd

DataFunTalk

Apr 15, 2021 · Big Data

Technical Evolution and Production Optimization of Real‑Time OLAP at BTC.com

This article details BTC.com’s journey in building a real‑time OLAP platform for blockchain data, covering business background, challenges, architectural evolution, technology choices such as Flink and ClickHouse, optimization techniques, monitoring, and future directions.

FlinkKubernetesReal-time OLAP

0 likes · 10 min read

Technical Evolution and Production Optimization of Real‑Time OLAP at BTC.com

Huolala Tech

Apr 8, 2021 · Big Data

Mastering PyFlink on Kubernetes: Practical Deployment Strategies and Lessons

This article explains how to deploy a PyFlink feature platform on Kubernetes, covering basic K8s concepts, Flink execution graphs, various deployment modes, preparation steps, detailed Standalone and Native deployment procedures, and practical tips for efficient big‑data processing.

Cloud NativeDeploymentFlink

0 likes · 16 min read

Mastering PyFlink on Kubernetes: Practical Deployment Strategies and Lessons

Big Data Technology & Architecture

Apr 6, 2021 · Big Data

Real-Time Computing and Data Warehouse Solutions with Apache Flink: Architecture, Technology Selection, and Implementation

This article explores the evolution of real-time computing in the big data domain, detailing Apache Flink's capabilities, architectural designs, technology selections such as Kafka, Canal, HBase, ClickHouse, and provides practical implementation guides and case studies from Alibaba, Tencent, and other enterprises.

FlinkReal‑Time Computingdata-warehouse

0 likes · 33 min read

Real-Time Computing and Data Warehouse Solutions with Apache Flink: Architecture, Technology Selection, and Implementation

DataFunTalk

Apr 5, 2021 · Big Data

Bigo Real‑Time Computing Platform: Architecture, Features, and Performance Improvements

This article presents the evolution, architecture, and key innovations of Bigo's real‑time computing platform—covering its migration from Spark Streaming to Flink, unified platform design, development tools, operational enhancements, and the efficiency gains achieved in business scenarios such as ETL and AB‑testing.

AB testingBigoFlink

0 likes · 13 min read

Bigo Real‑Time Computing Platform: Architecture, Features, and Performance Improvements

Big Data Technology Architecture

Apr 5, 2021 · Big Data

Evolution of Real‑Time Data Warehouses: From 1.0 to 3.0 and the Road to Batch‑Stream Unified Architecture

The article reviews the current state of offline Hive‑based data warehouses, explains the emergence of real‑time data warehouses (1.0) built on Kafka and Flink, discusses their limitations, and outlines the progression toward batch‑stream unified architectures (2.0 and 3.0) leveraging data‑lake technologies such as Iceberg.

Batch-Stream IntegrationBig DataFlink

0 likes · 13 min read

Evolution of Real‑Time Data Warehouses: From 1.0 to 3.0 and the Road to Batch‑Stream Unified Architecture

Big Data Technology & Architecture

Apr 4, 2021 · Big Data

Flink Performance Tuning Guide: Memory Configuration, Parallelism, Checkpoint Optimization, and Common Issues

This guide details comprehensive Flink performance tuning techniques, covering memory configuration, GC settings, parallelism adjustments, process parameters, partitioning strategies, Netty network tuning, checkpoint optimization, and common issues such as data skew and resource bottlenecks.

CheckpointFlinkMemory Management

0 likes · 18 min read

Flink Performance Tuning Guide: Memory Configuration, Parallelism, Checkpoint Optimization, and Common Issues

DataFunTalk

Apr 3, 2021 · Big Data

Building a Real-Time Data Computing Platform for Tencent Games: Practices and Architecture

This article describes Tencent Games' end‑to‑end real‑time data platform, covering its construction background, the unified OneData development framework, the OneFun data‑service API layer, micro‑service and ServiceMesh management, and the operational benefits achieved through automation, standardization, and scalability.

FlinkGame AnalyticsMicroservices

0 likes · 14 min read

Building a Real-Time Data Computing Platform for Tencent Games: Practices and Architecture

Big Data Technology & Architecture

Mar 29, 2021 · Big Data

Understanding Flink AggregateFunction, Session Windows, and Timer Mechanisms

This article explains how Flink's DataStream API uses AggregateFunction and session windows, details the MergingWindowAssigner and MergingWindowSet implementations, and demonstrates timer registration and processing with KeyedProcessFunction, providing full code examples and internal workflow analysis.

AggregateFunctionFlinkKeyedProcessFunction

0 likes · 25 min read

Understanding Flink AggregateFunction, Session Windows, and Timer Mechanisms

HelloTech

Mar 26, 2021 · Big Data

Data Quality and Interface Semantic Monitoring for Algorithm Testing Platform

The article describes how algorithm testing teams tackled data‑quality and interface‑semantic monitoring problems by building a unified business monitoring platform that checks table, storage and service consistency, validates response semantics, and, through dashboards, alerts and correction tools, quickly identified dozens of offline and online issues, guiding future reliability enhancements.

AIBig DataData Quality

0 likes · 26 min read

Data Quality and Interface Semantic Monitoring for Algorithm Testing Platform

iQIYI Technical Product Team

Mar 26, 2021 · Big Data

Evolution of iQIYI's Real-Time Big Data Ecosystem

iQIYI transformed its data infrastructure from a traditional offline T+1 model to a comprehensive real‑time ecosystem—leveraging Kafka, Flink, a three‑layer Stream Data Service Platform, the Talos drag‑and‑drop pipeline, and a Druid‑based analytics platform—to enable low‑latency monitoring, personalized recommendations, ad targeting, and continuous machine‑learning workflows while planning future stream‑batch integration and lake‑warehouse convergence.

AnalyticsBig DataFlink

0 likes · 13 min read

Evolution of iQIYI's Real-Time Big Data Ecosystem

Big Data Technology & Architecture

Mar 23, 2021 · Big Data

Practical Implementations of Data Lakes: Huawei Production Scenario, Real-Time Financial Data Lake, and Soul's Delta Lake

This article presents a comprehensive overview of data lake implementations, detailing Huawei's production‑scene platform, a real‑time financial data lake architecture using Kafka, Flink and Iceberg, and Soul's Delta Lake practice with Spark, Hive, and custom ETL tools, highlighting design choices, processing flows, and operational considerations.

Data LakeDelta LakeFlink

0 likes · 8 min read

Practical Implementations of Data Lakes: Huawei Production Scenario, Real-Time Financial Data Lake, and Soul's Delta Lake

DataFunTalk

Mar 21, 2021 · Big Data

Single‑Point Recovery and Regional Checkpoint in Flink: Design, Implementation, and Optimizations

This article presents ByteDance's recent Flink enhancements, detailing a single‑point recovery mechanism for the network layer and a regional checkpoint strategy that together improve failover latency, reduce output loss, and enable scalable, high‑throughput stream processing for large‑scale real‑time recommendation workloads.

Big DataCheckpointFlink

0 likes · 12 min read

Single‑Point Recovery and Regional Checkpoint in Flink: Design, Implementation, and Optimizations

Big Data Technology & Architecture

Mar 18, 2021 · Big Data

Flink Job Troubleshooting and Performance Optimization: Data Skew, Kafka Configuration, Resource Management, and Checkpoint Issues

This article details common Flink streaming problems such as data skew causing task back‑pressure, oversized Kafka messages, high‑throughput ack settings, slot removal errors, checkpoint timeouts, and resource constraints, and provides concrete configuration changes and architectural adjustments to resolve them.

CheckpointData SkewFlink

0 likes · 18 min read

Flink Job Troubleshooting and Performance Optimization: Data Skew, Kafka Configuration, Resource Management, and Checkpoint Issues

Big Data Technology & Architecture

Mar 16, 2021 · Big Data

Using Flink Upsert‑Kafka Connector for Real‑Time Data Aggregation and TiDB Synchronization

This article explains the upsert‑kafka connector in Flink, its configuration parameters, step‑by‑step usage with SQL examples, and demonstrates a complete pipeline that reads Kafka streams, aggregates page view metrics, and synchronizes the results to TiDB in real time.

FlinkStreamingTiDB

0 likes · 13 min read

Using Flink Upsert‑Kafka Connector for Real‑Time Data Aggregation and TiDB Synchronization

DataFunTalk

Mar 15, 2021 · Big Data

Ten Gotchas When Migrating Spark Jobs to Flink

This article shares ten practical pitfalls encountered while moving hour‑level Spark session processing jobs to Apache Flink, covering parallelism skew, state TTL, checkpoint handling, logging, debugging, state migration, Reduce vs Process, input validation, event‑time handling, and the trade‑offs of storing data inside Flink.

Big DataFlinkState Management

0 likes · 19 min read

Ten Gotchas When Migrating Spark Jobs to Flink

Big Data Technology & Architecture

Mar 15, 2021 · Big Data

Implementation and Usage of Flink FileSystem, JDBC, and Kafka Connectors

The article provides a comprehensive technical guide on Flink's FileSystem, JDBC, and Kafka connectors, detailing their source and sink implementations, core code logic, checkpoint handling, partition commit strategies, and complete SQL usage examples for streaming applications.

ConnectorFilesystemFlink

0 likes · 25 min read

Implementation and Usage of Flink FileSystem, JDBC, and Kafka Connectors

DataFunTalk

Mar 7, 2021 · Big Data

Building Stream‑Batch Integrated ETL with Flink SQL: Data Warehouse and Data Integration

This article explains how Flink SQL can be used to construct a unified stream‑batch ETL pipeline for data warehouses and data lakes, covering data integration, CDC support, streaming writes to Hive and Iceberg, and various join techniques such as regular, interval, and temporal joins.

CDCData IntegrationETL

0 likes · 20 min read

Building Stream‑Batch Integrated ETL with Flink SQL: Data Warehouse and Data Integration

360 Smart Cloud

Mar 4, 2021 · Information Security

Improving Large-Scale Regex Matching Performance with Hyperscan and Flink

This article explains how to boost the efficiency of massive regular‑expression matching by using Intel's Hyperscan library, integrating it with Apache Flink for streaming processing, and providing deployment guidelines for both private and internal environments.

FlinkStreaminghyperscan

0 likes · 10 min read

Improving Large-Scale Regex Matching Performance with Hyperscan and Flink

DataFunTalk

Mar 1, 2021 · Artificial Intelligence

Online Learning and Real‑Time Model Updating in JD Retail Search Using Flink

The article describes JD's end‑to‑end online learning pipeline for retail search, covering the background, system architecture, real‑time feature collection, sample stitching, Flink‑based incremental training, parameter updates, and full‑link monitoring to achieve low‑latency, high‑accuracy model serving.

FlinkModel ServingOnline Learning

0 likes · 9 min read

Online Learning and Real‑Time Model Updating in JD Retail Search Using Flink

360 Tech Engineering

Feb 26, 2021 · Big Data

Improving Large-Scale Regex Matching Performance with Hyperscan and Flink Integration

This article explains how to boost massive regular‑expression matching speed by using Intel's Hyperscan engine together with Apache Flink for streaming, covering security scenarios, architectural challenges, deployment options, usage examples, performance results, and future enhancements.

Flinkbig-datahyperscan

0 likes · 9 min read

Improving Large-Scale Regex Matching Performance with Hyperscan and Flink Integration

dbaplus Community

Feb 23, 2021 · Big Data

How NetEase Game Teams Built a Scalable Flink‑Based Streaming ETL Platform

This article explains how NetEase games collect heterogeneous logs, design a Flink‑driven streaming ETL pipeline, handle schema‑free sources, implement Python UDFs with Jython, optimize HDFS writes, manage real‑time and offline warehouses, and share practical tuning and fault‑tolerance techniques.

ETLFlinkKafka

0 likes · 22 min read

How NetEase Game Teams Built a Scalable Flink‑Based Streaming ETL Platform

DataFunTalk

Feb 22, 2021 · Big Data

Optimizing Flink Real-Time Task Resources: Memory and Message Processing Perspectives

This article explores practical methods for optimizing Flink real‑time task resources on Kubernetes, focusing on memory usage analysis via GC logs and message‑processing capacity assessment, proposing automated detection of over‑provisioned memory and CPU, and outlining a workflow for resource adjustment to reduce costs.

Big DataFlinkGC Analysis

0 likes · 18 min read

Optimizing Flink Real-Time Task Resources: Memory and Message Processing Perspectives

dbaplus Community

Feb 18, 2021 · Big Data

How JD Search Scaled Real‑Time Analytics with Flink and Doris

This article details JD Search's journey from a Storm‑based pipeline to a Flink‑driven architecture backed by Apache Doris, covering business requirements, technical challenges, design trade‑offs, performance optimizations for massive traffic spikes, and future plans for their real‑time OLAP data warehouse.

Big DataFlinkOLAP

0 likes · 12 min read

How JD Search Scaled Real‑Time Analytics with Flink and Doris

DataFunTalk

Feb 17, 2021 · Big Data

Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

The article details Apache Iceberg 0.11.0's core enhancements—including partition changes, SortOrder, extensive Flink and Spark integrations, CDC/Upsert support, hash‑based write distribution to reduce small files, and upcoming 0.12.0 roadmap—while providing practical SQL and API examples for data‑lake practitioners.

Apache IcebergBig DataCDC

0 likes · 13 min read

Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

DataFunTalk

Feb 15, 2021 · Big Data

Flink-Driven Incremental Data Warehouse Production at Meituan: Architecture, Streaming Integration, and Future Plans

This article presents Meituan's use of Flink to enable incremental data warehouse production, covering the warehouse architecture, streaming data integration evolution, real-time OLAP applications, platform design, and future directions for unified stream‑batch processing.

Big DataFlinkIncremental Processing

0 likes · 11 min read

Flink-Driven Incremental Data Warehouse Production at Meituan: Architecture, Streaming Integration, and Future Plans

Big Data Technology & Architecture

Feb 7, 2021 · Big Data

Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases

This guide walks through setting up Apache Zeppelin as a low‑cost, SQL‑centric development platform for Flink, covering environment preparation, installation, interpreter configuration, execution modes, verification, common pitfalls, dimension‑table joins, custom UDFs, Redis integration, and dual‑stream join techniques.

FlinkStreamingUDF

0 likes · 24 min read

Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases

DataFunTalk

Feb 7, 2021 · Big Data

Optimizations and Extensions for Flink SQL in Tencent Real‑Time Computing Platform

This article, presented by Tencent senior engineer Du Li, details the current state of Flink SQL, compares Jar, Canvas, and SQL modes, introduces window‑function extensions, retract‑stream optimizations, and outlines future roadmap plans for cost‑based optimization and new features in the real‑time computing platform.

Big DataFlinkRetract Stream

0 likes · 19 min read

Optimizations and Extensions for Flink SQL in Tencent Real‑Time Computing Platform

Big Data Technology & Architecture

Feb 2, 2021 · Big Data

An Introduction to Apache Iceberg: Features, Spark & Flink Integration, and Real‑World Use Cases

This article provides a comprehensive overview of Apache Iceberg, covering its origins, key features, practical Spark and Flink code examples, notable deployments at Alibaba and Tencent, and its future role as a universal table format for big‑data analytics.

Apache IcebergData LakeFlink

0 likes · 9 min read

An Introduction to Apache Iceberg: Features, Spark & Flink Integration, and Real‑World Use Cases

Big Data Technology & Architecture

Feb 1, 2021 · Big Data

Flink 1.12 Enhancements: Full SQL Support, Hive Integration, and Streaming Write to Hive

The article reviews Flink 1.12's major enhancements, including comprehensive SQL capabilities, deep integration with Hive via catalog and streaming support, and a practical code example that demonstrates how to write streaming data into Hive tables while handling partition commits and small‑file merging.

Data IntegrationFlinkStreaming

0 likes · 7 min read

Flink 1.12 Enhancements: Full SQL Support, Hive Integration, and Streaming Write to Hive

TAL Education Technology

Jan 28, 2021 · Big Data

Batch-Stream Fusion in Education: TAL’s Real-Time Data Platform Practices

This article, presented by senior data platform engineer Mao Xiangyi of TAL Education, details the design and implementation of the company’s real‑time T‑Streaming platform, covering its three‑layer data architecture, batch‑stream integration techniques, ODS layer real‑timeization, Flink SQL development workflow, hybrid‑cloud deployment, and a case study of K‑12 renewal reporting.

Batch-Stream IntegrationEducation AnalyticsFlink

0 likes · 18 min read

Batch-Stream Fusion in Education: TAL’s Real-Time Data Platform Practices

DataFunTalk

Jan 28, 2021 · Big Data

Real-Time Financial Data Lake: Architecture, Practices, and Applications at Zhongyuan Bank

This talk by Ba Xueyu, a senior big data platform engineer at Zhongyuan Bank, outlines the background, architecture, and engineering practices of a real‑time financial data lake, highlighting its open, timely, and integrated design, streaming platform implementation, and use cases such as anti‑fraud and real‑time BI.

Flinkanti-fraudfinancial analytics

0 likes · 15 min read

Real-Time Financial Data Lake: Architecture, Practices, and Applications at Zhongyuan Bank

dbaplus Community

Jan 27, 2021 · Big Data

How We Upgraded a 1500-Node Flink Cluster to 1.10: Challenges and Solutions

Facing a massive 1500‑node Flink 1.4.2 cluster handling over 12,000 tasks and 30 trillion daily events, we migrated to Flink 1.10, detailing new DDL/Catalog support, SQL enhancements, memory tuning, compatibility patches, extensive testing, and engine optimizations such as task‑load metrics and balanced sub‑task scheduling.

Big DataFlinkVersion Upgrade

0 likes · 13 min read

How We Upgraded a 1500-Node Flink Cluster to 1.10: Challenges and Solutions

Big Data Technology & Architecture

Jan 22, 2021 · Big Data

Flink Performance Tuning: Principles, Metrics, and JVM Configuration

This article explains how to diagnose and optimize Flink jobs by first examining metrics, then checking resource allocation, analyzing throughput and back‑pressure, and finally tuning JVM settings, while providing concrete configuration examples and practical tips for big‑data practitioners.

FlinkJVMMetrics

0 likes · 7 min read

Flink Performance Tuning: Principles, Metrics, and JVM Configuration

Big Data Technology & Architecture

Jan 17, 2021 · Operations

Building an Enterprise‑Level Flink Monitoring System with Prometheus, Grafana and Pushgateway

This article explains how to use the Cloud Native Prometheus ecosystem—including Prometheus Server, exporters, Pushgateway, Alertmanager and Grafana—to collect, store, query and visualize Flink job metrics, providing a complete monitoring solution for production clusters.

Cloud NativeFlinkGrafana

0 likes · 13 min read

Building an Enterprise‑Level Flink Monitoring System with Prometheus, Grafana and Pushgateway

DataFunTalk

Jan 16, 2021 · Big Data

Practical Application of Flink + Kafka at NetEase Cloud Music: Architecture, Platform Design, and Lessons Learned

This article presents a detailed case study of NetEase Cloud Music’s real‑time analytics platform built on Kafka and Flink, covering background, architectural choices, platform‑level design, operational challenges, solutions such as the Magina framework, and a Q&A on reliability and monitoring.

Big DataFlinkKafka

0 likes · 11 min read

Practical Application of Flink + Kafka at NetEase Cloud Music: Architecture, Platform Design, and Lessons Learned

Youzan Coder

Jan 13, 2021 · Big Data

Flink Real-time Task Resource Optimization Practice at Youzan

At Youzan, Flink real‑time tasks running on Kubernetes are optimized by daily GC‑log memory analysis and Kafka‑throughput monitoring, which compute recommended heap sizes and parallelism adjustments to eliminate over‑provisioned CPU and memory, automate alerts, and pave the way for fully automated resource tuning.

FlinkGC tuningKubernetes

0 likes · 16 min read

Flink Real-time Task Resource Optimization Practice at Youzan

Didi Tech

Jan 12, 2021 · Big Data

Upgrading DiDi Real‑time Computing Engine from Flink 1.4 to Flink 1.10: Challenges, Optimizations, and Lessons Learned

DiDi upgraded its massive real‑time computing engine from Flink 1.4.2 to Flink 1.10, implementing a transparent migration across 1500 machines, adding native DDL, binary rows, MiniBatch, improved scheduling and window functions, and establishing a rigorous testing pipeline that achieved 99.9 % compatibility while preventing OOM issues.

FlinkPerformanceOptimizationRealTimeComputing

0 likes · 11 min read

Upgrading DiDi Real‑time Computing Engine from Flink 1.4 to Flink 1.10: Challenges, Optimizations, and Lessons Learned

Big Data Technology & Architecture

Jan 11, 2021 · Big Data

Evolution of a Real‑Time Data Warehouse Architecture and Practical Lessons

This article recounts the author’s journey building a real‑time data warehouse using Flink, Kafka, Redis, and ClickHouse, describing the initial batch‑oriented setup, successive architectural evolutions, challenges with wide tables and dimension data, and the final OLAP‑centric solution with secondary caching.

Big DataFlinkOLAP

0 likes · 9 min read

Evolution of a Real‑Time Data Warehouse Architecture and Practical Lessons

Big Data Technology & Architecture

Jan 10, 2021 · Big Data

Integrating Apache Flink 1.12 with Hive: Configuration, Catalog, Planner, and UDF Usage

This guide explains how to integrate Flink 1.12 with Hive using HiveCatalog, covering required dependencies, Blink planner configuration, SQL dialect switching, Hive UDF support, temporal table joins, and provides complete code snippets for a streaming‑batch unified data warehouse solution.

Blink PlannerFlinkStreaming

0 likes · 16 min read

Integrating Apache Flink 1.12 with Hive: Configuration, Catalog, Planner, and UDF Usage

Big Data Technology & Architecture

Jan 9, 2021 · Big Data

Comprehensive 2021 Flink Interview Questions and Answers

This article presents a detailed collection of 2021 Flink interview questions covering checkpoint mechanisms, watermarks, state backends, join types, fault tolerance, resource configuration, and recent Flink 1.10 features, providing concise explanations and code examples for each topic.

CheckpointFlinkState Backend

0 likes · 23 min read

Comprehensive 2021 Flink Interview Questions and Answers

NetEase Game Operations Platform

Jan 9, 2021 · Operations

Real-Time Log Intelligent Classification Practice

This article describes how NetEase built a real‑time log intelligent classification system using Flink and AI algorithms, detailing the challenges of massive log volumes, the Drain template‑extraction method, algorithm workflow, performance results, and a practical case study that demonstrates reduced alert storms and faster issue diagnosis.

AIDrain algorithmFlink

0 likes · 15 min read

Real-Time Log Intelligent Classification Practice

58 Tech

Jan 4, 2021 · Big Data

Building a Real‑Time Data Warehouse with Flink: Architecture, Implementation and Lessons Learned

This article describes how a fast‑growing company built a layered real‑time data warehouse on Flink, detailing the evolution from a simple 1.0 pipeline to a 2.0 architecture with ODS, DWD and ADS layers, dimension joins, exactly‑once sinks, HDFS partitioning, monitoring, and future improvements.

Big DataETLFlink

0 likes · 14 min read

Building a Real‑Time Data Warehouse with Flink: Architecture, Implementation and Lessons Learned

Big Data Technology & Architecture

Dec 29, 2020 · Big Data

Real-time MySQL Binlog Capture with Oracle GoldenGate and Kafka Integration

This article provides a step‑by‑step guide on configuring MySQL binlog, installing and deploying Oracle GoldenGate, extracting changes, converting them to JSON, and streaming the data into Kafka for real‑time processing, complete with code snippets and verification procedures.

BinlogFlinkOracle GoldenGate

0 likes · 11 min read

Real-time MySQL Binlog Capture with Oracle GoldenGate and Kafka Integration

Big Data Technology & Architecture

Dec 29, 2020 · Databases

Setting Up and Using the MySQL CDC Connector with Apache Flink

This article provides a step‑by‑step guide on configuring the MySQL CDC connector for Flink, covering Maven and SQL client dependencies, MySQL user setup, connector options, table creation via SQL and Stream API, key features, common issues, and practical troubleshooting tips.

CDCConnectorFlink

0 likes · 10 min read

Setting Up and Using the MySQL CDC Connector with Apache Flink

Big Data Technology & Architecture

Dec 27, 2020 · Big Data

Understanding and Solving the Small File Problem in Big Data Systems

This article examines the pervasive small‑file issue in big‑data environments, explains its impact on storage and processing performance, and presents a comprehensive set of solutions—including file merging, Hadoop archives, SequenceFiles, HBase, CombineFileInputFormat, and Spark/Flink strategies—to mitigate metadata overhead and improve I/O efficiency.

FlinkHadoopNameNode

0 likes · 41 min read

Understanding and Solving the Small File Problem in Big Data Systems

Big Data Technology & Architecture

Dec 25, 2020 · Big Data

Implementing Custom Source and Sink in Flink Streaming with RocketMQ and HBase

This article details how to migrate Spark Streaming jobs to Flink Streaming by creating custom SourceFunction and SinkFunction implementations, including a RocketMQ source connector and an HBase sink, with code examples, configuration tips, and discussion of checkpointing and watermark handling.

FlinkHBaseRocketMQ

0 likes · 20 min read

Implementing Custom Source and Sink in Flink Streaming with RocketMQ and HBase

Youzan Coder

Dec 21, 2020 · Big Data

Youzan Big Data Technology Salon – Cost Governance, Apache Iceberg, Flink, and Data‑Driven Growth

At Youzan’s Big Data Technology Salon, over 100 attendees heard leaders from Youzan, NetEase Yishu, and Didi discuss cost governance, Apache Iceberg data lakes, large‑scale Flink real‑time computing, and data‑driven growth strategies, highlighting practical implementations, savings of millions and tools for merchant empowerment.

Apache IcebergData GrowthFlink

0 likes · 5 min read

Youzan Big Data Technology Salon – Cost Governance, Apache Iceberg, Flink, and Data‑Driven Growth

Big Data Technology & Architecture

Dec 20, 2020 · Big Data

Getting Started with Apache Zeppelin: Installation, Core Features, and Integration with JDBC, Spark, and Flink

This tutorial introduces Apache Zeppelin, explains REPL and Jupyter concepts, outlines its core features and project structure, and provides step‑by‑step instructions for installing Zeppelin, creating notebooks, and connecting to databases, Spark, and Flink with practical code examples.

Apache ZeppelinFlinkInstallation

0 likes · 11 min read

Getting Started with Apache Zeppelin: Installation, Core Features, and Integration with JDBC, Spark, and Flink

Full-Stack Internet Architecture

Dec 20, 2020 · Big Data

Using Flinkx for Data Synchronization in Sharded MySQL Environments

This article explains how to leverage Flinkx and Flink Stream API to create a unified data‑sync task that extracts data from sharded MySQL tables, splits the workload, and pushes it to an MQ cluster, while detailing the underlying InputFormat and Reader architecture.

Big DataFlinkFlinkX

0 likes · 8 min read

Using Flinkx for Data Synchronization in Sharded MySQL Environments

ITFLY8 Architecture Home

Dec 18, 2020 · Big Data

Unlocking the Data Middle Platform: From Ingestion to Real‑Time Analytics

This article provides a comprehensive overview of data middle platform concepts, covering data aggregation, collection tools, development modules, job scheduling, baseline control, heterogeneous storage, permission management, real‑time and offline processing, governance, services, and implementation details for building robust big‑data solutions.

Data GovernanceData PlatformETL

0 likes · 25 min read

Unlocking the Data Middle Platform: From Ingestion to Real‑Time Analytics

Big Data Technology & Architecture

Dec 17, 2020 · Big Data

Running Flink on Kerberos-secured YARN: Authentication and Configuration Guide

This article explains why Kerberos is needed for Hadoop clusters, details the Kerberos authentication workflow, and provides step‑by‑step instructions for configuring Flink to run on a Kerberos‑protected YARN environment using delegation tokens or keytab files, along with proxy‑user settings.

Delegation TokenFlinkKerberos

0 likes · 12 min read

Running Flink on Kerberos-secured YARN: Authentication and Configuration Guide

Big Data Technology & Architecture

Dec 16, 2020 · Big Data

Designing a Real‑Time Data Processing Platform with Flink: Architecture, Deployment, and Operations

This article explains how to build a real‑time data processing platform using Flink, covering the Lambda architecture, design approaches, SQL and custom‑Jar task definitions, UI drag‑and‑drop, cluster resource management on Yarn and Kubernetes, submission modes, scheduling, permission and metadata handling, logging, and monitoring with Prometheus and Grafana.

Cluster ManagementFlinkLambda architecture

0 likes · 19 min read

Designing a Real‑Time Data Processing Platform with Flink: Architecture, Deployment, and Operations

Big Data Technology & Architecture

Dec 15, 2020 · Big Data

Flink on Kubernetes: Architecture, Deployment Modes, and Operational Guide

This article explains Flink’s architecture and details how to run Flink on Kubernetes using both Standalone and native modes, covering Session and Per‑Job deployment, required ConfigMaps, Deployments, Services, and command‑line steps for creating, submitting, and deleting Flink jobs.

DeploymentFlinkKubernetes

0 likes · 13 min read

Flink on Kubernetes: Architecture, Deployment Modes, and Operational Guide

Youzan Coder

Dec 9, 2020 · Big Data

Youzan Big Data Technology Salon: Practices in Data Cost Governance, Apache Iceberg, Flink, and Data-Driven Growth

The Youzan Big Data Technology Salon brought together Youzan, NetEase and Didi to share practical approaches for cutting data‑infrastructure costs, building an Apache Iceberg‑based data lake, scaling Flink real‑time workloads, and creating a data‑driven growth platform that leverages tracking, A/B testing and analytics.

Apache IcebergBig DataData Cost Governance

0 likes · 5 min read

Youzan Big Data Technology Salon: Practices in Data Cost Governance, Apache Iceberg, Flink, and Data-Driven Growth

DataFunTalk

Dec 7, 2020 · Big Data

Jingdong's Flink Real‑Time Computing Platform: Containerization, Optimizations, and Future Roadmap

This article details Jingdong's evolution from Storm to Flink, the architecture of its Kubernetes‑based real‑time computing platform, extensive containerization practices, performance and stability optimizations, and the future plan to unify batch‑stream processing while expanding SQL support and intelligent operations.

Batch-Stream IntegrationFlinkKubernetes

0 likes · 16 min read

Jingdong's Flink Real‑Time Computing Platform: Containerization, Optimizations, and Future Roadmap

DataFunTalk

Dec 6, 2020 · Artificial Intelligence

Building an AI Ecosystem with Flink: Overview of AI Flow and Its Architecture

This article explains how Flink enables end‑to‑end machine‑learning workflows through AI Flow, covering the background of Lambda architecture, AI task stages, the advantages of Flink, AI Flow components, AI Graph concepts, integration with Python and TensorFlow, and a real‑world advertising recommendation use case.

AI FlowFlinkReal-Time

0 likes · 14 min read

Building an AI Ecosystem with Flink: Overview of AI Flow and Its Architecture

DataFunTalk

Dec 3, 2020 · Big Data

Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg

This article explains how Apache Flink integrates with data lake architectures, especially using Apache Iceberg as a table format, to enable real‑time streaming ingestion, CDC processing, near‑real‑time lambda architectures, and future enhancements like automatic file merging and row‑level deletes.

Apache IcebergData LakeFlink

0 likes · 13 min read

Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg

DataFunSummit

Dec 1, 2020 · Artificial Intelligence

Building an AI Ecosystem with Flink: AI Flow Architecture, Components, and Applications

This article explains how Flink enables end‑to‑end AI workflows through the AI Flow platform, covering the Lambda architecture background, AI task pipeline stages, the reasons for choosing Flink, AI Flow’s graph model, core services, integration with ML pipelines, and real‑world advertising recommendation use cases.

AI FlowAI PipelineBig Data

0 likes · 12 min read

Building an AI Ecosystem with Flink: AI Flow Architecture, Components, and Applications

Alibaba Cloud Developer

Nov 23, 2020 · Big Data

How Alibaba’s CCO Built a Cloud‑Native Real‑Time Data Warehouse with Hologres

Alibaba’s Customer Experience (CCO) team transformed its real‑time data platform by evolving from a Lambda‑style database architecture to a cloud‑native real‑time data warehouse powered by Hologres and Flink, achieving higher throughput, lower latency, reduced costs, and self‑service analytics for massive Double‑11 traffic.

AlibabaBig DataFlink

0 likes · 15 min read

How Alibaba’s CCO Built a Cloud‑Native Real‑Time Data Warehouse with Hologres

Big Data Technology & Architecture

Nov 23, 2020 · Big Data

Introduction to Flink CDC: Concepts, Use Cases, and Implementation

This article explains Change Data Capture (CDC) and how Flink CDC can be used for incremental data synchronization, real‑time materialized views, audit logging, and CDC‑based joins, providing code examples, Maven dependencies, and SQL/Java snippets for MySQL and Kafka integrations.

CDCChange Data CaptureFlink

0 likes · 9 min read

Introduction to Flink CDC: Concepts, Use Cases, and Implementation

Alibaba Cloud Developer

Nov 22, 2020 · Big Data

How Flink’s Stream‑Batch Integration Powered Alibaba’s Record‑Breaking Double‑11

Alibaba’s 2020 Double‑11 achieved unprecedented real‑time processing of 4 billion records per second and 7 TB of data per second using Flink, showcasing the stability, performance and efficiency of its stream‑batch unified architecture across diverse business scenarios.

AlibabaBatch ProcessingBig Data

0 likes · 15 min read

How Flink’s Stream‑Batch Integration Powered Alibaba’s Record‑Breaking Double‑11

DataFunTalk

Nov 17, 2020 · Artificial Intelligence

Alink: A Flink‑Based Machine Learning Platform – Overview, Features, and Quick‑Start Guide

This article introduces Alink, Alibaba's open‑source machine‑learning platform built on Flink, explains its core algorithms, performance comparison with Spark ML, version‑wise feature evolution, and provides practical quick‑start instructions for both Java (Maven) and Python (PyAlink) users, including data source handling, type conversion components, unified file‑system operations, and an overview of its FM algorithm implementation.

AlinkBatch ProcessingData Integration

0 likes · 13 min read

Alink: A Flink‑Based Machine Learning Platform – Overview, Features, and Quick‑Start Guide

DataFunSummit

Nov 15, 2020 · Big Data

Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink

This article details the three‑stage evolution of 58.com’s commercial data warehouse, describing its massive scale, four‑layer architecture, technical challenges, migrations from MapReduce to Hive and Flink, real‑time streaming upgrades, and the resulting improvements in stability, accuracy, and timeliness.

Big DataData ArchitectureFlink

0 likes · 10 min read

Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink

dbaplus Community

Nov 15, 2020 · Big Data

Mastering Real‑Time Stream Processing with Flink: From Fundamentals to Kuaishou Production

This article walks through the evolution of big‑data systems to modern stream processing, explains core Flink concepts such as state, checkpoints, event‑time and windowing, and details Kuaishou’s real‑time UV calculation and fast‑failover techniques for high‑availability streaming jobs.

Big DataFlinkKafka

0 likes · 21 min read

Mastering Real‑Time Stream Processing with Flink: From Fundamentals to Kuaishou Production

Big Data Technology & Architecture

Nov 13, 2020 · Big Data

Understanding Flink Operator Chaining Mechanism

This article explains the Flink operator chaining mechanism, detailing how logical plans are transformed into JobGraph and ExecutionGraph, the conditions for chaining, code implementations, and how the runtime constructs OperatorChain to improve execution efficiency.

FlinkJobGraphjava

0 likes · 12 min read

Understanding Flink Operator Chaining Mechanism

Big Data Technology & Architecture

Nov 12, 2020 · Backend Development

API vs SPI: Concepts, Implementation, and Real‑World Java Examples

This article explains the difference between APIs and Java SPI, describes how SPI enables plug‑in development through ServiceLoader, and illustrates its practical use with JDBC driver loading and Flink table factories, providing code snippets and implementation steps for backend developers.

APIFlinkJDBC

0 likes · 12 min read

API vs SPI: Concepts, Implementation, and Real‑World Java Examples

Architect

Nov 11, 2020 · Big Data

Real-time Click Stream Data Warehouse with Flink and ClickHouse: Architecture, Layered Design, and Practical Tips

This article explains how to build a real‑time click‑stream data warehouse using Flink for stream processing and ClickHouse for near‑real‑time OLAP, covering click‑stream characteristics, dimensional modeling, layered warehouse design, async dimension joins, sink implementation, and data rebalancing strategies.

Big DataClick StreamFlink

0 likes · 7 min read

Real-time Click Stream Data Warehouse with Flink and ClickHouse: Architecture, Layered Design, and Practical Tips

DataFunTalk

Nov 11, 2020 · Big Data

Evolution and Practices of Cainiao's Real‑Time Data Warehouse for International Import Business

This article details the high‑complexity logistics scenario of Cainiao's international import business, explains the evolution from offline to real‑time data warehouses (versions 1.0 and 2.0), describes the layered architecture, enumerates technical challenges such as multi‑source joins, state explosion, out‑of‑order processing, and presents concrete solutions using Flink features, logical middle‑layers, union‑all joins, deduplication, timer services, and batch‑stream hybrid processing.

Big DataFlinkState Management

0 likes · 21 min read

Evolution and Practices of Cainiao's Real‑Time Data Warehouse for International Import Business

DataFunSummit

Nov 10, 2020 · Artificial Intelligence

Alink: An Open‑Source Machine Learning Platform on Flink – Features, Performance, and Quick‑Start Guide

This article introduces Alink, Alibaba's open‑source machine‑learning platform built on Flink, detailing its core algorithms, performance advantages over Spark ML, version evolution, Maven and PyAlink installation steps, data‑source integrations, FM algorithm support, and unified file‑system operations for both batch and streaming workloads.

AlinkFlinkPyAlink

0 likes · 11 min read

Alink: An Open‑Source Machine Learning Platform on Flink – Features, Performance, and Quick‑Start Guide

Tencent Cloud Developer

Nov 10, 2020 · Big Data

Design and Optimization of a Real-Time Video Recommendation Indexing System

The article describes a real‑time video recommendation indexing system that replaces 30‑minute batch builds with an Elasticsearch‑based service, integrates prior and posterior data pipelines, ensures consistency via locking and version checks, enables zero‑downtime upgrades, smooths write spikes, and boosts recall performance through multi‑level caching and ES tuning, delivering sub‑40 ms latency and significant business growth.

ElasticsearchFlinkcaching

0 likes · 13 min read

Design and Optimization of a Real-Time Video Recommendation Indexing System

Big Data Technology & Architecture

Nov 9, 2020 · Big Data

Understanding the Actor Model and Akka in Big Data RPC Systems

This article introduces the Actor model, its fundamental rules, and how it underpins Flink and Spark RPC mechanisms, then explains the Akka framework, its actor hierarchy, supervision, lifecycle, and dispatcher, providing a concise foundation for distributed big‑data processing.

AkkaFlinkSpark

0 likes · 10 min read

Understanding the Actor Model and Akka in Big Data RPC Systems

360 Tech Engineering

Nov 6, 2020 · Big Data

Guide to Flink SQL: Features, Scenarios, and Productization

Flink SQL, the high‑level SQL interface for Apache Flink, offers language‑independent, dependency‑free, easy‑to‑use stream processing with advanced features such as DDL, UDFs, time semantics, windowing, pattern matching, and built‑in connectors, supporting data synchronization, batch‑stream fusion, Hive integration, and various product enhancements.

Data IntegrationFlinkReal-Time

0 likes · 11 min read

Guide to Flink SQL: Features, Scenarios, and Productization

Amap Tech

Nov 6, 2020 · Operations

Full-Link Load Testing Platform TestPG: Architecture, Corpus Production, and Intelligent Features

Gaode’s TestPG platform solves full‑link load‑testing bottlenecks by unifying traffic capture with Iflow, converting logs into standardized corpora via a Flink pipeline, and applying corpus‑intelligence that extracts seasonal feature statistics and predicts distributions for precise, feature‑level throttling, enabling faster, more reliable testing and future autonomous optimization.

FlinkLoad TestingTraffic Capture

0 likes · 16 min read

Full-Link Load Testing Platform TestPG: Architecture, Corpus Production, and Intelligent Features

Big Data Technology & Architecture

Nov 6, 2020 · Big Data

Integrating Flink SQL with Apache Zeppelin: Installation, Configuration, and Usage

This guide explains how to set up Apache Zeppelin as an interactive notebook for Flink SQL, covering download, environment configuration, Zeppelin and Flink interpreter settings on YARN, Hive integration, and step‑by‑step testing of streaming SQL queries.

ConfigurationFlinkYARN

0 likes · 11 min read

Integrating Flink SQL with Apache Zeppelin: Installation, Configuration, and Usage

DataFunTalk

Nov 1, 2020 · Big Data

Flink 1.11 Integration with Hive: New Features and Real‑time Data Warehouse

The article explains how Flink 1.11 deepens its integration with Hive, covering background, new connector features, simplified dependency management, enhanced Hive dialect, streaming writes and reads, temporal table joins, and how these capabilities enable a unified batch‑streaming data warehouse.

Batch‑Streaming IntegrationFlinkStreaming

0 likes · 16 min read

Flink 1.11 Integration with Hive: New Features and Real‑time Data Warehouse

Big Data Technology Architecture

Nov 1, 2020 · Big Data

Practical Application of Flink + Kafka in NetEase Cloud Music Real‑Time Computing Platform

This article presents NetEase Cloud Music's real‑time computing platform built on Flink and Kafka, covering background, architectural design, Kafka and Flink selection reasons, platformization, warehouse usage, encountered challenges, and the solutions implemented to improve reliability and performance.

FlinkKafkaReal-time Streaming

0 likes · 11 min read

Practical Application of Flink + Kafka in NetEase Cloud Music Real‑Time Computing Platform

dbaplus Community

Oct 29, 2020 · Big Data

Inside Didi’s Real-Time Data Warehouse for Ride-Sharing: Architecture & Lessons

This article details Didi’s end‑to‑end construction of a real‑time data warehouse for the Ride‑Sharing (顺风车) business, covering motivations, layer‑by‑layer architecture, naming conventions, StreamSQL capabilities, operational tooling, achieved results, challenges, and future batch‑stream integration plans.

DidiFlinkreal-time data warehouse

0 likes · 21 min read

Inside Didi’s Real-Time Data Warehouse for Ride-Sharing: Architecture & Lessons

DataFunTalk

Oct 29, 2020 · Big Data

Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink

Lyft transformed its legacy data pipeline by designing a cloud‑native, Flink‑based near real‑time analytics platform that ingests billions of events, writes Parquet files to S3, leverages Presto for interactive queries, and implements multi‑stage non‑blocking ETL, fault‑tolerant back‑fill, and extensive performance optimizations.

AWSData LakeETL

0 likes · 12 min read

Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink

Big Data Technology & Architecture

Oct 23, 2020 · Big Data

Overview of Real-Time Big Data Processing: Spark Structured Streaming, CarbonData, Flink, and Cloud Stream

This article provides a comprehensive overview of modern real‑time big‑data solutions, detailing Spark Structured Streaming capabilities, CarbonData’s storage architecture, Meituan’s Flink deployments, and Huawei Cloud Stream’s unified streaming service, highlighting their features, challenges, and future directions.

CarbonDataFlinkReal-time analytics

0 likes · 17 min read

Overview of Real-Time Big Data Processing: Spark Structured Streaming, CarbonData, Flink, and Cloud Stream

ITPUB

Oct 16, 2020 · Big Data

How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite

This article details NetEase Cloud Music's evolution of a real‑time data warehouse built on Flink 1.9 and Calcite, covering platform scale, architectural design, metadata management, SDK simplifications, monitoring improvements, and concrete use cases such as AB‑testing, live reporting, and feature serving.

Big DataCalciteFlink

0 likes · 8 min read

How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite

DataFunTalk

Oct 15, 2020 · Big Data

Real‑Time Computing for Online Education: Architecture, Data Platform and Automation at VIPKID

This article explains how VIPKID leverages real‑time streaming with Flink to build a unified data platform, automatically tag and process help requests during 1‑v‑1 live classes, and achieve significant reductions in manual monitoring while improving course quality and user experience.

Big DataFlinkReal-time Streaming

0 likes · 14 min read

Real‑Time Computing for Online Education: Architecture, Data Platform and Automation at VIPKID

dbaplus Community

Oct 13, 2020 · Big Data

How to Build a Real‑Time Data Warehouse with Flink: Principles, Architecture, and Best Practices

This article explains why real‑time data warehouses are needed, outlines their core principles, compares them with offline warehouses, describes typical use cases such as real‑time OLAP, dashboards, feature generation and monitoring, and provides a step‑by‑step guide to designing, implementing, and operating a Flink‑based streaming warehouse with Kafka, HBase, and metadata management.

FlinkKafkaOLAP

0 likes · 29 min read

How to Build a Real‑Time Data Warehouse with Flink: Principles, Architecture, and Best Practices