Tagged articles
946 articles
Page 6 of 10
DataFunTalk
DataFunTalk
Jan 10, 2022 · Big Data

Real‑Time Data Warehouse at Meituan: Architecture, Challenges, and Solutions

The talk by Tang Chuxi of Meituan explains typical real‑time data scenarios, the challenges faced when building a streaming data warehouse, and the design, development, operation, and performance‑optimisation solutions implemented on a Flink‑based platform to support massive, low‑latency business applications.

FlinkMeituandata-warehouse
0 likes · 17 min read
Real‑Time Data Warehouse at Meituan: Architecture, Challenges, and Solutions
dbaplus Community
dbaplus Community
Jan 5, 2022 · Big Data

How ByteDance Optimized Flink SQL for Real‑World Streaming at Scale

This article details ByteDance's practical experience with Apache Flink, covering SQL extensions, a visual SQL platform, performance tweaks such as window mini‑batching and custom windows, join and checkpoint recovery improvements, stream‑batch integration experiments, and future roadmap plans.

Batch IntegrationCheckpointFlink
0 likes · 16 min read
How ByteDance Optimized Flink SQL for Real‑World Streaming at Scale
DataFunTalk
DataFunTalk
Jan 1, 2022 · Big Data

JD's Flink Journey: Evolution, Optimizations, and Future Directions

This article details JD's adoption of Flink for real‑time computing, covering its evolution from Storm to Flink on Kubernetes, the platform architecture, major optimization techniques such as preview topology, backpressure handling, dynamic rebalance, checkpoint‑as‑savepoint, and outlines future plans including stream‑batch integration, stability improvements, intelligent operations, and AI integration.

Big DataFlinkJD
0 likes · 10 min read
JD's Flink Journey: Evolution, Optimizations, and Future Directions
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 31, 2021 · Big Data

Apache SeaTunnel Joins the Apache Incubator: Overview, Features, and Real‑World Use Cases

SeaTunnel, the China‑originated data‑integration platform built on Spark and Flink, has been accepted into the Apache Incubator, and this article introduces its history, architecture, plugin ecosystem, deployment requirements, and numerous enterprise deployments across batch and streaming big‑data scenarios.

ApacheBig DataData Integration
0 likes · 7 min read
Apache SeaTunnel Joins the Apache Incubator: Overview, Features, and Real‑World Use Cases
Tencent Cloud Developer
Tencent Cloud Developer
Dec 28, 2021 · Industry Insights

How Flink and ClickHouse Combine to Build High‑Performance Real‑Time Data Warehouses

This article analyzes the challenges of massive data query efficiency, explains how Flink's stream processing and ClickHouse's OLAP engine complement each other, and presents a layered real‑time data‑warehouse architecture with practical guidance on data ingestion, write strategies, quality assurance, and evolving batch‑stream integration patterns.

Big DataFlinkOLAP
0 likes · 19 min read
How Flink and ClickHouse Combine to Build High‑Performance Real‑Time Data Warehouses
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 22, 2021 · Big Data

Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse

This article explains Change Data Capture (CDC), compares query‑based and log‑based approaches, introduces Debezium and ClickHouse, and provides step‑by‑step Flink CDC and Flink SQL CDC examples—including Java source, deserialization, sink code and required Maven dependencies—to stream MySQL binlog changes into ClickHouse for real‑time analytics.

Big DataCDCData Streaming
0 likes · 14 min read
Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse
HelloTech
HelloTech
Dec 20, 2021 · Big Data

Building an ElasticSearch-based Search Platform for Ride-Hailing: Architecture, Data Synchronization, and Performance Optimization

Hello Mobility unified its fragmented ElasticSearch clusters into a single, real‑time search platform—leveraging Kafka‑driven CDC, Flink stream processing, custom ES plugins, and extensive performance tuning—to deliver scalable matching, recommendation and voice services, ultimately raising completed orders by 49.8 % and driver acceptance by 37 %.

Big DataFlinkSearch Platform
0 likes · 19 min read
Building an ElasticSearch-based Search Platform for Ride-Hailing: Architecture, Data Synchronization, and Performance Optimization
DataFunTalk
DataFunTalk
Dec 19, 2021 · Big Data

OPPO Real-Time Computing Platform Architecture and Practices

This article details OPPO’s real-time computing platform architecture, covering its background, open‑source and self‑developed components, job lifecycle, SQL IDE, diagnostic and monitoring mechanisms, SLA guarantees, practical applications such as real‑time warehousing and dashboards, and future plans for lakehouse integration and cloud‑native deployment.

Flinkcloud-nativejob monitoring
0 likes · 20 min read
OPPO Real-Time Computing Platform Architecture and Practices
Youzan Coder
Youzan Coder
Dec 17, 2021 · Big Data

Upgrading Real-Time Computing Engine from Flink 1.10 to 1.13: Practices and Challenges

Youzan upgraded its real‑time computing engine from Flink 1.10 to 1.13 to meet rising SQL and containerization demands, gaining enhanced SQL syntax, time‑function handling, Window TVF standardization, Hive integration, K8s stability, elastic scaling, richer Kafka and format support, improved metrics and debugging tools, and successfully migrated all custom connectors, UDFs, and SQL jobs to the new Kubernetes‑based platform.

FlinkReal‑Time Computingcontainerization
0 likes · 22 min read
Upgrading Real-Time Computing Engine from Flink 1.10 to 1.13: Practices and Challenges
Beike Product & Technology
Beike Product & Technology
Dec 17, 2021 · Operations

Practices for Monitoring, Resource Optimization, and Containerization of Large-Scale Flink Jobs at Beike

This article describes Beike's real‑time computing team's end‑to‑end practices for collecting and storing Flink metrics, building visual monitoring dashboards, implementing multi‑level alerting, analyzing logs, estimating CPU and memory resources, and deploying Flink on Kubernetes with containerization and storage separation to improve stability, resource utilization, and operational efficiency.

FlinkKubernetesMetrics
0 likes · 25 min read
Practices for Monitoring, Resource Optimization, and Containerization of Large-Scale Flink Jobs at Beike
dbaplus Community
dbaplus Community
Dec 13, 2021 · Backend Development

ElasticSearch Powers Ride‑Matching at Haro Mobility: Architecture & Lessons

This article details how Haro Mobility built a search‑driven ride‑matching platform using ElasticSearch and Flink, covering the business background, architectural evolution, data‑sync challenges, performance tuning, stability measures, and the resulting improvements in order completion and user engagement.

Backend ArchitectureFlinkRide Matching
0 likes · 21 min read
ElasticSearch Powers Ride‑Matching at Haro Mobility: Architecture & Lessons
HelloTech
HelloTech
Dec 13, 2021 · Big Data

Smart Matching Engine for Ride-Sharing: Technical Implementation and Algorithms

The Smart Matching Engine for Haolo’s ride‑sharing service ingests driver and passenger orders via Kafka‑Flink pipelines into Elasticsearch, then applies multi‑stage matching—nearby search, itinerary‑based filtering using ETA, angle, distance, route‑similarity and shared‑mileage calculations—and finally ranks results with evolving pre‑sorting and algorithmic models, including PMML and deep‑learning, to optimize driver‑passenger pairing.

ElasticsearchFlinkKafka
0 likes · 9 min read
Smart Matching Engine for Ride-Sharing: Technical Implementation and Algorithms
DataFunTalk
DataFunTalk
Dec 10, 2021 · Big Data

Building and Evolving NetEase Yanxuan Real-Time Computing Platform: Architecture, SQLization, Serviceization, and Data Governance

This article details NetEase Yanxuan's real-time computing platform development from 2017 to present, covering its architecture, Flink‑SQL development environment, service‑oriented deployment, resource optimization, cloud‑native migration, comprehensive data governance, and future plans for stream‑batch integration and intelligent job diagnostics.

Big DataCloud NativeData Governance
0 likes · 14 min read
Building and Evolving NetEase Yanxuan Real-Time Computing Platform: Architecture, SQLization, Serviceization, and Data Governance
DataFunSummit
DataFunSummit
Dec 10, 2021 · Big Data

Real‑Time Platform Construction at NetEase Yanxuan: Architecture, SQL‑Based Streaming, Serviceization, and Data Governance

This article details NetEase Yanxuan's evolution of a real‑time data platform from 2017 to present, covering background, current scale, layered architecture, Flink‑SQL development IDE, service‑oriented task execution, resource‑optimizing deployment modes, cloud‑native migration, comprehensive data governance, and future batch‑stream integration plans.

Big DataCloud NativeData Governance
0 likes · 15 min read
Real‑Time Platform Construction at NetEase Yanxuan: Architecture, SQL‑Based Streaming, Serviceization, and Data Governance
Youzan Coder
Youzan Coder
Dec 8, 2021 · Big Data

How to Build a Real‑Time Data Quality Monitoring System with Flink

This article outlines a comprehensive approach to monitoring and ensuring the accuracy and timeliness of real‑time data streams, detailing background challenges, solution design, implementation steps using Flink and automated testing, alert handling procedures, and future improvement plans.

AlertingData QualityFlink
0 likes · 10 min read
How to Build a Real‑Time Data Quality Monitoring System with Flink
HomeTech
HomeTech
Dec 7, 2021 · Big Data

Flink Task Auto-scaling Design and Implementation

This article presents the design and implementation of Flink task auto‑scaling, covering background, manual and automatic scaling mechanisms, architecture with RescaleCoordinator, persistence via Zookeeper and HDFS, scaling policies for parallelism, CPU and memory, and future plans for fine‑grained and time‑based resource adjustments.

Auto ScalingFlinkHDFS
0 likes · 4 min read
Flink Task Auto-scaling Design and Implementation
StarRocks
StarRocks
Nov 26, 2021 · Big Data

How Autohome Achieved Sub‑Second Real‑Time Analytics with StarRocks

Autohome replaced Flink and Kylin with StarRocks to power sub‑second real‑time OLAP analytics, detailing data sources, pain points, benchmark comparisons against Apache Kylin, ClickHouse, Presto, Spark, and Doris, integration with Flink‑connector, broker‑load scripts, monitoring setup, and lessons learned from large‑scale deployments.

FlinkOLAPStarRocks
0 likes · 12 min read
How Autohome Achieved Sub‑Second Real‑Time Analytics with StarRocks
HomeTech
HomeTech
Nov 24, 2021 · Databases

Real‑Time Data Analysis at AutoHome: Evaluation and Adoption of StarRocks

This article describes AutoHome's real‑time data analysis architecture, the challenges of existing OLAP solutions, the reasons for choosing StarRocks, detailed performance comparisons with Kylin, ClickHouse, Doris, Presto and Spark, and the practical integration of StarRocks with Flink, broker‑load scripts, and monitoring tools.

FlinkOLAPReal-time analytics
0 likes · 9 min read
Real‑Time Data Analysis at AutoHome: Evaluation and Adoption of StarRocks
Big Data Technology Architecture
Big Data Technology Architecture
Nov 23, 2021 · Big Data

Step-by-Step Guide to Setting Up Flink CDC with MySQL, Hudi, and Hive Integration on a Hadoop Cluster

This comprehensive tutorial walks through configuring a Hadoop‑based environment (Flink 1.13.1, Scala 2.11, CDH 6.2.0, Hive 2.1.1, Hudi 0.10), compiling Hudi, setting up Flink and MySQL binlog, creating CDC source and Hudi sink tables, running Flink jobs, and synchronizing the results to Hive partitions for query via Hive and Presto.

CDCFlinkHudi
0 likes · 15 min read
Step-by-Step Guide to Setting Up Flink CDC with MySQL, Hudi, and Hive Integration on a Hadoop Cluster
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 22, 2021 · Big Data

Comprehensive Big Data Learning Path and Resource Guide

This article presents a detailed learning roadmap for aspiring big‑data experts, covering foundational programming languages, data structures, Linux basics, databases, distributed system theory, and essential frameworks such as Hadoop, Spark, Flink, Kafka, and provides curated B‑site video links and reference materials.

Big DataFlinkHadoop
0 likes · 9 min read
Comprehensive Big Data Learning Path and Resource Guide
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 22, 2021 · Big Data

Achieving Exactly-Once Writes from Flink to ClickHouse: Architecture and Performance

This article explains how Flink and ClickHouse can be combined to build a real-time data warehouse with end-to-end Exactly-Once guarantees, detailing the underlying write mechanisms, transaction state machine, connector implementation, and performance test results, while also outlining future enhancements for distributed transactions.

FlinkPerformance Testingclickhouse
0 likes · 15 min read
Achieving Exactly-Once Writes from Flink to ClickHouse: Architecture and Performance
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 20, 2021 · Big Data

Comprehensive Overview of Apache Flink Concepts, Mechanisms, and Interview Questions

This article provides an extensive technical guide to Apache Flink, covering its exactly‑once consumption guarantees, checkpoint and two‑phase commit mechanisms, differences from Spark, state backends, watermark handling, time semantics, window joins, CEP, backpressure, architecture layers, deployment, resource management, and common operational issues.

Big DataCEPCheckpoint
0 likes · 77 min read
Comprehensive Overview of Apache Flink Concepts, Mechanisms, and Interview Questions
Tencent Cloud Developer
Tencent Cloud Developer
Nov 19, 2021 · Artificial Intelligence

End‑to‑End Breast Cancer Prediction Solution Using Decision Tree on Tencent Cloud AI Platform

This guide details an end‑to‑end breast‑cancer prediction pipeline on Tencent Cloud, covering offline decision‑tree training with TI‑ONE, model packaging as a PMML service, real‑time feature generation via Oceanus and CKafka, and live inference stored in ClickHouse, all within a secure VPC.

AIFlinkReal-time Streaming
0 likes · 19 min read
End‑to‑End Breast Cancer Prediction Solution Using Decision Tree on Tencent Cloud AI Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 16, 2021 · Big Data

Flink Checkpoint, Backpressure, and Memory Tuning Guide

This article provides a comprehensive guide on optimizing Flink checkpoints, diagnosing and alleviating backpressure, and fine‑tuning memory configurations—including process, heap, off‑heap, managed, and network memory—to improve job stability and performance in large‑scale streaming applications.

CheckpointFlinkMemory Tuning
0 likes · 25 min read
Flink Checkpoint, Backpressure, and Memory Tuning Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 8, 2021 · Big Data

Why Choose Apache Iceberg? Tencent’s Optimizations and Real‑World Practices

This article examines the strengths and weaknesses of Apache Iceberg, explains why Tencent selected it over alternatives, details Tencent’s own enhancements and integration with Flink, Spark, and other engines, and shares multiple real‑world implementations for building enterprise‑grade real‑time data lakes.

Apache IcebergData LakeFlink
0 likes · 17 min read
Why Choose Apache Iceberg? Tencent’s Optimizations and Real‑World Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 8, 2021 · Big Data

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

This article introduces Flink CDC 2.0, explains its distributed full‑load and incremental reading mechanisms, details the slice partitioning, snapshot correction, and binlog handling logic, and provides a complete Java example that demonstrates how to configure Flink SQL, MySQL source, and Kafka sink.

Big DataCDCData Integration
0 likes · 29 min read
Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough
DataFunTalk
DataFunTalk
Nov 6, 2021 · Artificial Intelligence

Elastic Federated Learning Solution (EFLS): Project Overview, Architecture, and Technical Implementation

The article introduces Alibaba's Elastic Federated Learning Solution (EFLS), describing its business motivations, core functionalities, system architecture, sample‑set intersection, federated training pipeline, novel algorithms, product console, and future roadmap for privacy‑preserving advertising in large‑scale sparse scenarios.

AdvertisingDistributed SystemsFederated Learning
0 likes · 18 min read
Elastic Federated Learning Solution (EFLS): Project Overview, Architecture, and Technical Implementation
HomeTech
HomeTech
Nov 3, 2021 · Big Data

Real‑time Materialized View Practices with Apache Flink: System Analysis, Algorithm Design, and Implementation

This article presents Car Home's experience building a real‑time materialized view system on Apache Flink, detailing system analysis, problem decomposition, a global‑version‑based CDC algorithm, its implementation as a Flink connector, practical deployment results, and remaining challenges such as clock dependency and state size.

CDCFlinkalgorithm
0 likes · 17 min read
Real‑time Materialized View Practices with Apache Flink: System Analysis, Algorithm Design, and Implementation
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 29, 2021 · Big Data

Dimension Table Join Strategies in Apache Flink: Preload, Distributed Cache, Hot Storage, Broadcast, and Temporal Table Function

The article explains various dimension‑table join approaches in Apache Flink, including preloading tables into memory, using distributed cache, leveraging hot storage with async I/O, broadcasting state, and temporal table function joins, and compares their trade‑offs for different data volumes and update frequencies.

Dimension TableFlinkJOIN
0 likes · 10 min read
Dimension Table Join Strategies in Apache Flink: Preload, Distributed Cache, Hot Storage, Broadcast, and Temporal Table Function
Alimama Tech
Alimama Tech
Oct 27, 2021 · Artificial Intelligence

Elastic Federated Learning Solution (EFLS): Architecture, Core Functions, and Technical Details

The Elastic Federated Learning Solution (EFLS) is Alibaba’s open‑source platform that enables privacy‑preserving vertical and horizontal federated learning for large‑scale sparse advertising, offering data‑intersection, high‑performance C++ training, a visual console, novel aggregation algorithms, and a roadmap toward multi‑party scaling and advanced encryption.

AdvertisingElastic Federated LearningFlink
0 likes · 16 min read
Elastic Federated Learning Solution (EFLS): Architecture, Core Functions, and Technical Details
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 26, 2021 · Big Data

Practical Experience Building a Real‑Time Clickstream Data Warehouse with Flink and ClickHouse

This article shares practical insights on designing and operating a real‑time clickstream data warehouse using Flink for streaming processing and ClickHouse for near‑real‑time OLAP, covering dimensional modeling, layered architecture, Flink‑ClickHouse sink implementation, and data rebalancing strategies.

FlinkReal-time analyticsStreaming
0 likes · 10 min read
Practical Experience Building a Real‑Time Clickstream Data Warehouse with Flink and ClickHouse
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 12, 2021 · Big Data

Data Lake Evolution and a Practical Flink + Iceberg Implementation Guide

This article explores the evolution of data lakes, compares major cloud providers' lake architectures, introduces the emerging lakehouse concept, and provides a step‑by‑step Flink‑Iceberg implementation—including dependencies, catalog setup, table creation, checkpointing, and Kafka ingestion—demonstrating practical big‑data streaming solutions.

Data LakeFlinkIceberg
0 likes · 14 min read
Data Lake Evolution and a Practical Flink + Iceberg Implementation Guide
Java Architect Essentials
Java Architect Essentials
Sep 21, 2021 · Big Data

Interview on Kuaishou's Billion‑Scale Big Data Architecture Evolution and Practices

The interview with Kuaishou senior architect Zhao Jianbo details the three‑phase evolution of its trillion‑scale big data platform, covering foundational Hadoop services, real‑time and OLAP extensions, deep customizations, Spring Festival Gala challenges, scheduling innovations, Hadoop usage, and the relationship between big data and cloud architectures.

Big DataFlinkHadoop
0 likes · 19 min read
Interview on Kuaishou's Billion‑Scale Big Data Architecture Evolution and Practices
NetEase Game Operations Platform
NetEase Game Operations Platform
Sep 18, 2021 · Big Data

StreamflySQL: NetEase Games’ Journey from Template JAR to SQL Gateway for Flink SQL Platformization

This article details NetEase Games’ evolution of its Flink SQL platform, from the early StreamflySQL v1 template‑JAR approach to the v2 SQL‑Gateway architecture, discussing design decisions, challenges such as metadata persistence, multi‑tenant security, horizontal scaling, and job state management.

FlinkReal-time analyticsplatform engineering
0 likes · 17 min read
StreamflySQL: NetEase Games’ Journey from Template JAR to SQL Gateway for Flink SQL Platformization
Big Data Technology Architecture
Big Data Technology Architecture
Sep 17, 2021 · Big Data

Real‑time Computing Platform Architecture, Flink Migration, and One‑stop Platform at 58.com

This article details the design and implementation of 58.com’s real‑time computing platform, covering its architecture, data ingestion, storage, Flink‑based stream processing, SQL extensions, performance optimizations, Storm‑to‑Flink migration tools, the Wstream management console, state handling, monitoring, and future roadmap.

Data PlatformFlinkReal‑Time Computing
0 likes · 16 min read
Real‑time Computing Platform Architecture, Flink Migration, and One‑stop Platform at 58.com
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 13, 2021 · Big Data

Understanding Bytecode, Code Generation, Serialization, and Data Processing Techniques in Spark and Flink

This article explains how bytecode and code‑generation improve Spark SQL performance, compares Java I/O and MapReduce InputFormats, reviews serialization choices in Spark and Flink, and describes reflection‑based DataFrame creation, storage‑memory eviction, fail‑fast design, and ConcurrentHashMap usage in big‑data frameworks.

FlinkSparkcode-generation
0 likes · 11 min read
Understanding Bytecode, Code Generation, Serialization, and Data Processing Techniques in Spark and Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 11, 2021 · Big Data

Deep Dive into Flink Table & SQL Window Functions, UDFs, and Hive Integration

This article provides a comprehensive guide to Flink Table and SQL window semantics—including group, tumbling, sliding, and session windows—covers over windows, demonstrates how to define windows in SQL, explains built‑in functions, shows how to implement scalar, table, aggregate and table‑aggregate UDFs, and details Flink's integration with Hive, complete with Maven dependencies and runnable examples.

FlinkHive IntegrationTable API
0 likes · 27 min read
Deep Dive into Flink Table & SQL Window Functions, UDFs, and Hive Integration
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Sep 10, 2021 · Big Data

Real‑time OLAP with Flink and Hologres: Replacing Lambda/Kappa Architectures

This article analyzes the limitations of traditional Lambda and Kappa big‑data architectures for online‑school behavior‑feature pipelines and presents a Flink + Hologres solution that provides unified real‑time OLAP and high‑concurrency point‑query services, including design choices, implementation details, and performance results.

FlinkHologresKappa architecture
0 likes · 12 min read
Real‑time OLAP with Flink and Hologres: Replacing Lambda/Kappa Architectures
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 10, 2021 · Big Data

Understanding Flink Table API and SQL: Dependencies, Planners, and Practical Usage

This article provides a comprehensive guide to Apache Flink's Table API and SQL, covering required dependencies, the differences between old and Blink planners, program structure, table environment creation, catalog registration, query execution, conversion between DataStream and Table, update modes, and time attribute handling, with Scala code examples throughout.

FlinkScalaStreaming
0 likes · 26 min read
Understanding Flink Table API and SQL: Dependencies, Planners, and Practical Usage
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 6, 2021 · Big Data

Comprehensive Guide to Flink Join Operations: Interval Join, Window Join, Broadcast, and Temporal Table Function

This article explains Flink's various join mechanisms—including interval‑based joins, window‑based joins, streaming SQL joins, and dimension‑table joins such as preload, hot‑storage, broadcast, and temporal‑table function—provides detailed code examples in Java, discusses state management and performance considerations, and summarizes the four main dimension‑table join patterns.

Broadcast StateFlinkJOIN
0 likes · 32 min read
Comprehensive Guide to Flink Join Operations: Interval Join, Window Join, Broadcast, and Temporal Table Function
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 2, 2021 · Big Data

Understanding Network Flow Control and Flink's Backpressure Mechanisms

This article explains the concepts and background of network flow control, compares static rate limiting with dynamic feedback backpressure, describes TCP's sliding‑window mechanism, and details how Flink implements both TCP‑based and credit‑based backpressure to handle mismatched upstream‑downstream speeds in streaming applications.

Credit-basedFlinkNetwork Flow Control
0 likes · 16 min read
Understanding Network Flow Control and Flink's Backpressure Mechanisms
ByteDance ADFE Team
ByteDance ADFE Team
Aug 31, 2021 · Big Data

Evolution of the Big Data Technology Stack Over the Past Five Years

This article reviews the evolution of big data technologies in the last five years, covering streaming and batch processing frameworks, column‑store NoSQL databases, programming language trends, the cloud‑native multi‑model database Lindorm, and practical Flink/Blink usage with code examples.

Big DataFlinkLindorm
0 likes · 24 min read
Evolution of the Big Data Technology Stack Over the Past Five Years
Meituan Technology Team
Meituan Technology Team
Aug 26, 2021 · Big Data

How Meituan Built a Scalable Real‑Time Data Warehouse: Architecture & Lessons

Meituan Waimai’s data intelligence team outlines a universal real‑time data‑warehouse methodology that combines a production platform with an interactive analytics engine, detailing scenarios, technology choices, architectural designs, platformization, SLA management, and a practical Lambda‑style case study.

FlinkKappa architectureLambda architecture
0 likes · 18 min read
How Meituan Built a Scalable Real‑Time Data Warehouse: Architecture & Lessons
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 24, 2021 · Big Data

Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake

This article provides an in-depth overview of data lake concepts, definitions, and essential features, followed by detailed case studies of enterprise data lake implementations and comparative analysis of leading data lake table formats—Iceberg, Hudi, and Delta Lake—highlighting their architectures, capabilities, and trade‑offs.

Data LakeDelta LakeFlink
0 likes · 19 min read
Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake
Python Crawling & Data Mining
Python Crawling & Data Mining
Aug 21, 2021 · Big Data

Understanding Flink’s Architecture: From APIs to Cluster Deployment

This article explains Flink’s three‑layer architecture (APIs & Libraries, Core, Deploy), details its programming interfaces, runtime engine, deployment options, and core concepts such as stateful computation and time semantics, providing a comprehensive guide for building robust stream and batch applications.

FlinkStateful ComputingTime Semantics
0 likes · 13 min read
Understanding Flink’s Architecture: From APIs to Cluster Deployment
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Aug 3, 2021 · Big Data

How BIGO Scaled Real‑Time Messaging by Migrating from Kafka to Pulsar

BIGO replaced its Kafka‑based message‑flow platform with Apache Pulsar to overcome scaling, stability, and operational cost challenges, leveraging Pulsar’s storage‑compute separation, seamless horizontal expansion, low latency, and tight integration with Flink for real‑time ETL and AB‑test pipelines, resulting in billions of messages processed daily with half the hardware cost.

Apache PulsarETLFlink
0 likes · 17 min read
How BIGO Scaled Real‑Time Messaging by Migrating from Kafka to Pulsar
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 2, 2021 · Big Data

Comprehensive Big Data Interview Question Guide for Major Tech Companies

This article compiles extensive interview questions and topics covering Hadoop, Spark, Flink, Hive, Kafka, MySQL, Redis, Java fundamentals, and algorithms, organized by companies such as Xiaomi, ByteDance, Alibaba, Shopee, Tencent, Meituan, NetEase, and Baidu, to help candidates prepare effectively for big‑data engineering roles.

Big DataFlinkHadoop
0 likes · 22 min read
Comprehensive Big Data Interview Question Guide for Major Tech Companies
DataFunTalk
DataFunTalk
Jul 29, 2021 · Big Data

Real-Time Data Warehouse Construction at TAL Using DorisDB

This article details TAL's transition from offline to real-time data warehousing, describing business drivers, pain points, architectural evolution through Hive, Flink+Kudu, and DorisDB, and outlining the system design, data flow, scheduling, monitoring, and the resulting business and cost benefits.

AirflowBig DataDorisDB
0 likes · 14 min read
Real-Time Data Warehouse Construction at TAL Using DorisDB
DataFunTalk
DataFunTalk
Jul 27, 2021 · Big Data

Building a Real‑Time Data Warehouse with Apache Doris at Shuhai Supply Chain

This article describes how Shuhai Supply Chain upgraded its data warehouse from a complex, high‑cost 1.0 architecture to a streamlined, real‑time solution built around Apache Doris, detailing the motivations, design choices, zero‑code ingestion, metadata management, Flink connector, and the resulting performance gains.

Apache DorisBig DataFlink
0 likes · 13 min read
Building a Real‑Time Data Warehouse with Apache Doris at Shuhai Supply Chain
DataFunTalk
DataFunTalk
Jul 26, 2021 · Big Data

Accelerating Hive Daily Tables with Flink: A SmartNews Case Study

This article describes how SmartNews integrated Flink into its Airflow‑driven Hive batch pipeline to cut the actions table generation latency from three hours to about thirty‑four minutes, detailing the technical challenges, design decisions, and production results.

AWSBig DataFlink
0 likes · 12 min read
Accelerating Hive Daily Tables with Flink: A SmartNews Case Study
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 20, 2021 · Big Data

Common Issues and Solutions for Flink CDC with MySQL

This article summarizes frequent problems encountered when using Flink CDC with MySQL—including Kafka version conflicts, checkpoint timeouts, permission errors, global lock issues, and DDL parsing failures—and provides practical configuration tweaks and code examples to resolve them.

CDCCheckpointDebezium
0 likes · 11 min read
Common Issues and Solutions for Flink CDC with MySQL
Big Data Technology Architecture
Big Data Technology Architecture
Jul 20, 2021 · Big Data

PB‑Level Ad‑hoc Query Practice with Flink: Threat Hunting Platform Architecture and IO‑Reducing Optimizations

This article details 360's Threat Hunting platform built on Flink, covering its evolution, architecture, block‑index design, Hilbert‑curve data ordering, like‑pushdown, join optimizations, Alluxio caching, and future plans for BI and multi‑user concurrency, all aimed at efficient PB‑scale data querying.

AlluxioBlock IndexFlink
0 likes · 18 min read
PB‑Level Ad‑hoc Query Practice with Flink: Threat Hunting Platform Architecture and IO‑Reducing Optimizations
Big Data Technology Architecture
Big Data Technology Architecture
Jul 15, 2021 · Big Data

Building Data Lake Solutions with Iceberg and Object Storage: Architecture, Write/Read Processes, and Storage Optimization

This article presents a comprehensive overview of using Apache Iceberg with object storage to construct scalable data lake solutions, covering lake architecture, Iceberg table organization, Flink‑based write and read workflows, catalog abstractions, object storage versus HDFS comparisons, append‑upload and atomic‑commit challenges, a demonstration setup, and ideas for storage optimization.

CatalogFlinkIceberg
0 likes · 16 min read
Building Data Lake Solutions with Iceberg and Object Storage: Architecture, Write/Read Processes, and Storage Optimization
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 10, 2021 · Big Data

Comprehensive Big Data Learning Path and Interview Knowledge Map

This extensive guide outlines a modern big‑data learning roadmap, covering essential programming languages, Linux, databases, distributed system theory, networking, offline and real‑time computation, message queues, data warehouses, algorithms, backend skills, interview preparation, and practical advice for building a personal knowledge system.

FlinkHadoopLearning Path
0 likes · 24 min read
Comprehensive Big Data Learning Path and Interview Knowledge Map
JD Retail Technology
JD Retail Technology
Jul 5, 2021 · Backend Development

Design and Implementation of JD's Real-Time Browsing Record System

The article describes JD's real-time browsing record system architecture, detailing its four modules—storage, query, real-time reporting, and offline reporting—along with hot‑cold data separation, use of Jimdb, HBase, Kafka, and Flink to achieve millisecond‑level latency and high throughput for billions of user records.

BrowsingFlinkHBase
0 likes · 12 min read
Design and Implementation of JD's Real-Time Browsing Record System
Youzan Coder
Youzan Coder
Jun 30, 2021 · Big Data

Online Monitoring Practices for Offline and Real-Time Data at Youzan

Youzan Data Report Center monitors offline batch and real‑time data pipelines using accuracy and timeliness rules, cross‑table checks, upstream‑downstream comparisons, and scheduled alerts to detect anomalies early; since 2021 it has generated over 25 alerts, and plans a unified data‑quality dashboard.

Big DataData QualityFlink
0 likes · 12 min read
Online Monitoring Practices for Offline and Real-Time Data at Youzan
DataFunTalk
DataFunTalk
Jun 29, 2021 · Big Data

In-depth Analysis of Flink SQL 1.13 Features and Improvements

This article provides a comprehensive overview of Apache Flink SQL 1.13, detailing new Window TVF support, cumulate windows, performance optimizations, time‑zone handling, enhanced Hive compatibility, SQL client upgrades, DataStream‑Table conversion improvements, and outlines the roadmap for the upcoming 1.14 release.

DataStreamFlinkHive Integration
0 likes · 15 min read
In-depth Analysis of Flink SQL 1.13 Features and Improvements
DataFunTalk
DataFunTalk
Jun 21, 2021 · Big Data

Flink + Iceberg 0.11 Practices in Qunar Data Platform

This article shares Qunar's experience using Flink together with Apache Iceberg 0.11 to address real‑time data warehouse challenges, covering background pain points, Iceberg architecture, solutions for Kafka data loss and Hive latency, and optimization practices such as small‑file handling, sorting, and checkpoint management.

Big DataData LakeFlink
0 likes · 13 min read
Flink + Iceberg 0.11 Practices in Qunar Data Platform
DataFunTalk
DataFunTalk
Jun 20, 2021 · Databases

Xiaohongshu’s OLAP Architecture Evolution and DorisDB Adoption

This article details Xiaohongshu’s multi‑stage evolution of its OLAP infrastructure—from Redshift to Presto, ClickHouse, and finally DorisDB—describing the data pipeline, tool comparisons, advertising use‑case implementation, and the resulting performance and operational benefits.

Big DataDorisDBFlink
0 likes · 12 min read
Xiaohongshu’s OLAP Architecture Evolution and DorisDB Adoption
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 16, 2021 · Big Data

Practical Experience and Optimizations of Apache Iceberg in Tencent’s Big Data Ecosystem

This article reviews the advantages of Apache Iceberg for data lake storage, details Tencent’s custom optimizations and integration with Flink and Spark, and shares multiple real‑world implementations that demonstrate how Iceberg improves data consistency, reduces small‑file overhead, and enables near‑real‑time analytics in large‑scale big‑data environments.

Apache IcebergData LakeFlink
0 likes · 18 min read
Practical Experience and Optimizations of Apache Iceberg in Tencent’s Big Data Ecosystem
Java High-Performance Architecture
Java High-Performance Architecture
Jun 14, 2021 · Big Data

How NetEase Games Built a Scalable Flink‑Based Streaming ETL Platform

This article explains how NetEase Games engineers designed and operated a Flink‑driven streaming ETL system, covering business background, log classification, dedicated and generic ETL services, architecture evolution, Python UDF integration, runtime optimizations, tuning practices, fault‑tolerance mechanisms, and future roadmap.

FlinkGame Analyticsdata pipeline
0 likes · 22 min read
How NetEase Games Built a Scalable Flink‑Based Streaming ETL Platform
Architecture Digest
Architecture Digest
Jun 10, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's streaming ETL solution built on Flink, covering business background, log characteristics, specialized and generic ETL services, architectural evolution, Python UDF integration, runtime optimizations, fault‑tolerance mechanisms, and future roadmap for unified real‑time and offline data warehouses.

Big DataFlinkLog Processing
0 likes · 19 min read
NetEase Game Streaming ETL Architecture and Practices Based on Flink
Sohu Tech Products
Sohu Tech Products
Jun 9, 2021 · Big Data

Real-time UV Counting with Flink, Hologres, and RoaringBitmap

This article explains how to implement both offline (T+1) and real‑time UV counting using Hologres with RoaringBitmap for high‑cardinality aggregation, and demonstrates a complete Flink‑Hologres pipeline—including table creation, streaming joins, windowed aggregation, and query examples—for fine‑grained user metric analysis.

FlinkHologresRoaringBitmap
0 likes · 11 min read
Real-time UV Counting with Flink, Hologres, and RoaringBitmap
DataFunTalk
DataFunTalk
Jun 6, 2021 · Big Data

Understanding Apache Pulsar: Cloud‑Native Messaging, Storage‑Compute Separation, and Batch‑Stream Fusion with Flink

This article explains Apache Pulsar’s cloud‑native, storage‑compute separated architecture, its data model and scalability features, and how it integrates with Flink to provide a unified platform for both real‑time streaming and batch processing in big‑data applications.

Apache PulsarBatch-Stream IntegrationBig Data
0 likes · 17 min read
Understanding Apache Pulsar: Cloud‑Native Messaging, Storage‑Compute Separation, and Batch‑Stream Fusion with Flink
IT Architects Alliance
IT Architects Alliance
Jun 5, 2021 · Big Data

How to Build a Real‑Time Recommendation System with Flink, HBase, and Docker

This article walks through a complete real‑time recommendation system built on Apache Flink, detailing its v2.0 architecture, modules for user behavior, interest, and product profiling, the recommendation algorithms (hot‑list, collaborative filtering, item similarity), and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka.

DockerFlinkHBase
0 likes · 11 min read
How to Build a Real‑Time Recommendation System with Flink, HBase, and Docker
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 1, 2021 · Big Data

Understanding Idle State Retention Time in Flink SQL

Flink SQL's idle state retention time feature prevents state explosion by automatically cleaning up state for keys that remain inactive beyond a configurable time window, requiring both minimum and maximum retention settings, with implementation details involving CleanupState, timers, and KeyedProcessFunctionWithCleanupState.

FlinkIdle State RetentionState Management
0 likes · 8 min read
Understanding Idle State Retention Time in Flink SQL
Big Data Technology Architecture
Big Data Technology Architecture
May 31, 2021 · Big Data

Practical Experience of Using Flink + Iceberg 0.11 on Qunar Data Platform

This article presents Qunar's practical experience with Flink and Iceberg 0.11, covering background challenges such as Kafka data loss and Hive metadata pressure, explaining Iceberg architecture, query planning, and detailed solutions including real‑time ingestion, small‑file handling, sorting, and code examples for seamless migration.

FlinkIcebergReal-time Processing
0 likes · 12 min read
Practical Experience of Using Flink + Iceberg 0.11 on Qunar Data Platform
IT Architects Alliance
IT Architects Alliance
May 30, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's Flink‑based streaming ETL system, detailing business background, log classifications, specialized and generic ETL services, Python UDF integration, runtime optimizations, HDFS write tuning, SLA metrics, fault‑tolerance mechanisms, and future roadmap for unified data lakes and PyFlink support.

Big DataData IntegrationETL
0 likes · 19 min read
NetEase Game Streaming ETL Architecture and Practices Based on Flink
dbaplus Community
dbaplus Community
May 27, 2021 · Big Data

How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink

This article details Vipshop's OLAP evolution, describing how Presto, Kylin, and ClickHouse are integrated, the deployment architecture with HAproxy and chproxy, containerization on Kubernetes, and the Flink‑ClickHouse pipeline that enables self‑service analysis of hundred‑billion‑row datasets while addressing performance challenges and future roadmap.

Big DataFlinkOLAP
0 likes · 28 min read
How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink
IT Architects Alliance
IT Architects Alliance
May 22, 2021 · Big Data

Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide

This article presents a comprehensive walkthrough of a Flink‑powered recommendation system, detailing its v2.0 architecture, module functions, recommendation algorithms (hotness, product similarity, collaborative filtering), front‑end and back‑end UI, and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka services.

Big DataDockerFlink
0 likes · 11 min read
Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide