Tagged articles

946 articles

Page 6 of 10

Jan 10, 2022 · Big Data

Real‑Time Data Warehouse at Meituan: Architecture, Challenges, and Solutions

The talk by Tang Chuxi of Meituan explains typical real‑time data scenarios, the challenges faced when building a streaming data warehouse, and the design, development, operation, and performance‑optimisation solutions implemented on a Flink‑based platform to support massive, low‑latency business applications.

FlinkMeituandata-warehouse

0 likes · 17 min read

Real‑Time Data Warehouse at Meituan: Architecture, Challenges, and Solutions

Big Data Technology & Architecture

Jan 10, 2022 · Big Data

Key Takeaways from Flink Forward 2021: Real‑Time Computing, Flink SQL, ML, and Streaming Warehouse

The article reviews highlights from Flink Forward 2021, describing how real‑time computing is spreading across traditional industries, the unstoppable move toward Flink SQL, the emergence of Flink ML, and the vision of a streaming warehouse built on Flink Dynamic Table technology.

Big DataFlinkReal‑Time Computing

0 likes · 8 min read

Key Takeaways from Flink Forward 2021: Real‑Time Computing, Flink SQL, ML, and Streaming Warehouse

dbaplus Community

Jan 5, 2022 · Big Data

How ByteDance Optimized Flink SQL for Real‑World Streaming at Scale

This article details ByteDance's practical experience with Apache Flink, covering SQL extensions, a visual SQL platform, performance tweaks such as window mini‑batching and custom windows, join and checkpoint recovery improvements, stream‑batch integration experiments, and future roadmap plans.

Batch IntegrationCheckpointFlink

0 likes · 16 min read

How ByteDance Optimized Flink SQL for Real‑World Streaming at Scale

DataFunTalk

Jan 1, 2022 · Big Data

JD's Flink Journey: Evolution, Optimizations, and Future Directions

This article details JD's adoption of Flink for real‑time computing, covering its evolution from Storm to Flink on Kubernetes, the platform architecture, major optimization techniques such as preview topology, backpressure handling, dynamic rebalance, checkpoint‑as‑savepoint, and outlines future plans including stream‑batch integration, stability improvements, intelligent operations, and AI integration.

Big DataFlinkJD

0 likes · 10 min read

JD's Flink Journey: Evolution, Optimizations, and Future Directions

Big Data Technology & Architecture

Dec 31, 2021 · Big Data

Apache SeaTunnel Joins the Apache Incubator: Overview, Features, and Real‑World Use Cases

SeaTunnel, the China‑originated data‑integration platform built on Spark and Flink, has been accepted into the Apache Incubator, and this article introduces its history, architecture, plugin ecosystem, deployment requirements, and numerous enterprise deployments across batch and streaming big‑data scenarios.

ApacheBig DataData Integration

0 likes · 7 min read

Apache SeaTunnel Joins the Apache Incubator: Overview, Features, and Real‑World Use Cases

Tencent Cloud Developer

Dec 28, 2021 · Industry Insights

How Flink and ClickHouse Combine to Build High‑Performance Real‑Time Data Warehouses

This article analyzes the challenges of massive data query efficiency, explains how Flink's stream processing and ClickHouse's OLAP engine complement each other, and presents a layered real‑time data‑warehouse architecture with practical guidance on data ingestion, write strategies, quality assurance, and evolving batch‑stream integration patterns.

Big DataFlinkOLAP

0 likes · 19 min read

How Flink and ClickHouse Combine to Build High‑Performance Real‑Time Data Warehouses

Big Data Technology & Architecture

Dec 22, 2021 · Big Data

Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse

This article explains Change Data Capture (CDC), compares query‑based and log‑based approaches, introduces Debezium and ClickHouse, and provides step‑by‑step Flink CDC and Flink SQL CDC examples—including Java source, deserialization, sink code and required Maven dependencies—to stream MySQL binlog changes into ClickHouse for real‑time analytics.

Big DataCDCData Streaming

0 likes · 14 min read

Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse

HelloTech

Dec 20, 2021 · Big Data

Building an ElasticSearch-based Search Platform for Ride-Hailing: Architecture, Data Synchronization, and Performance Optimization

Hello Mobility unified its fragmented ElasticSearch clusters into a single, real‑time search platform—leveraging Kafka‑driven CDC, Flink stream processing, custom ES plugins, and extensive performance tuning—to deliver scalable matching, recommendation and voice services, ultimately raising completed orders by 49.8 % and driver acceptance by 37 %.

Big DataFlinkSearch Platform

0 likes · 19 min read

Building an ElasticSearch-based Search Platform for Ride-Hailing: Architecture, Data Synchronization, and Performance Optimization

DataFunTalk

Dec 19, 2021 · Big Data

OPPO Real-Time Computing Platform Architecture and Practices

This article details OPPO’s real-time computing platform architecture, covering its background, open‑source and self‑developed components, job lifecycle, SQL IDE, diagnostic and monitoring mechanisms, SLA guarantees, practical applications such as real‑time warehousing and dashboards, and future plans for lakehouse integration and cloud‑native deployment.

Flinkcloud-nativejob monitoring

0 likes · 20 min read

OPPO Real-Time Computing Platform Architecture and Practices

Youzan Coder

Dec 17, 2021 · Big Data

Upgrading Real-Time Computing Engine from Flink 1.10 to 1.13: Practices and Challenges

Youzan upgraded its real‑time computing engine from Flink 1.10 to 1.13 to meet rising SQL and containerization demands, gaining enhanced SQL syntax, time‑function handling, Window TVF standardization, Hive integration, K8s stability, elastic scaling, richer Kafka and format support, improved metrics and debugging tools, and successfully migrated all custom connectors, UDFs, and SQL jobs to the new Kubernetes‑based platform.

FlinkReal‑Time Computingcontainerization

0 likes · 22 min read

Upgrading Real-Time Computing Engine from Flink 1.10 to 1.13: Practices and Challenges

Beike Product & Technology

Dec 17, 2021 · Operations

Practices for Monitoring, Resource Optimization, and Containerization of Large-Scale Flink Jobs at Beike

This article describes Beike's real‑time computing team's end‑to‑end practices for collecting and storing Flink metrics, building visual monitoring dashboards, implementing multi‑level alerting, analyzing logs, estimating CPU and memory resources, and deploying Flink on Kubernetes with containerization and storage separation to improve stability, resource utilization, and operational efficiency.

FlinkKubernetesMetrics

0 likes · 25 min read

Practices for Monitoring, Resource Optimization, and Containerization of Large-Scale Flink Jobs at Beike

dbaplus Community

Dec 13, 2021 · Backend Development

ElasticSearch Powers Ride‑Matching at Haro Mobility: Architecture & Lessons

This article details how Haro Mobility built a search‑driven ride‑matching platform using ElasticSearch and Flink, covering the business background, architectural evolution, data‑sync challenges, performance tuning, stability measures, and the resulting improvements in order completion and user engagement.

Backend ArchitectureFlinkRide Matching

0 likes · 21 min read

ElasticSearch Powers Ride‑Matching at Haro Mobility: Architecture & Lessons

HelloTech

Dec 13, 2021 · Big Data

Smart Matching Engine for Ride-Sharing: Technical Implementation and Algorithms

The Smart Matching Engine for Haolo’s ride‑sharing service ingests driver and passenger orders via Kafka‑Flink pipelines into Elasticsearch, then applies multi‑stage matching—nearby search, itinerary‑based filtering using ETA, angle, distance, route‑similarity and shared‑mileage calculations—and finally ranks results with evolving pre‑sorting and algorithmic models, including PMML and deep‑learning, to optimize driver‑passenger pairing.

ElasticsearchFlinkKafka

0 likes · 9 min read

Smart Matching Engine for Ride-Sharing: Technical Implementation and Algorithms

DataFunTalk

Dec 10, 2021 · Big Data

Building and Evolving NetEase Yanxuan Real-Time Computing Platform: Architecture, SQLization, Serviceization, and Data Governance

This article details NetEase Yanxuan's real-time computing platform development from 2017 to present, covering its architecture, Flink‑SQL development environment, service‑oriented deployment, resource optimization, cloud‑native migration, comprehensive data governance, and future plans for stream‑batch integration and intelligent job diagnostics.

Big DataCloud NativeData Governance

0 likes · 14 min read

Building and Evolving NetEase Yanxuan Real-Time Computing Platform: Architecture, SQLization, Serviceization, and Data Governance

DataFunSummit

Dec 10, 2021 · Big Data

Real‑Time Platform Construction at NetEase Yanxuan: Architecture, SQL‑Based Streaming, Serviceization, and Data Governance

This article details NetEase Yanxuan's evolution of a real‑time data platform from 2017 to present, covering background, current scale, layered architecture, Flink‑SQL development IDE, service‑oriented task execution, resource‑optimizing deployment modes, cloud‑native migration, comprehensive data governance, and future batch‑stream integration plans.

Big DataCloud NativeData Governance

0 likes · 15 min read

Real‑Time Platform Construction at NetEase Yanxuan: Architecture, SQL‑Based Streaming, Serviceization, and Data Governance

Youzan Coder

Dec 8, 2021 · Big Data

How to Build a Real‑Time Data Quality Monitoring System with Flink

This article outlines a comprehensive approach to monitoring and ensuring the accuracy and timeliness of real‑time data streams, detailing background challenges, solution design, implementation steps using Flink and automated testing, alert handling procedures, and future improvement plans.

AlertingData QualityFlink

0 likes · 10 min read

How to Build a Real‑Time Data Quality Monitoring System with Flink

HomeTech

Dec 7, 2021 · Big Data

Flink Task Auto-scaling Design and Implementation

This article presents the design and implementation of Flink task auto‑scaling, covering background, manual and automatic scaling mechanisms, architecture with RescaleCoordinator, persistence via Zookeeper and HDFS, scaling policies for parallelism, CPU and memory, and future plans for fine‑grained and time‑based resource adjustments.

Auto ScalingFlinkHDFS

0 likes · 4 min read

Flink Task Auto-scaling Design and Implementation

DataFunSummit

Dec 4, 2021 · Big Data

Building a Real-Time Data Warehouse with Flink: Hive Integration, Upsert‑Kafka, and CDC Connectors

This tutorial explains how to use Apache Flink 1.12 to construct a unified streaming‑batch data warehouse by integrating Hive via HiveCatalog and HiveDialect, performing read/write operations, configuring upsert‑Kafka sinks, and leveraging Flink CDC connectors for change data capture from MySQL and other sources.

CDCFlinkStreaming

0 likes · 46 min read

Building a Real-Time Data Warehouse with Flink: Hive Integration, Upsert‑Kafka, and CDC Connectors

Big Data Technology Architecture

Nov 30, 2021 · Big Data

Building a Real-Time MySQL and PostgreSQL Streaming ETL with Flink CDC

This tutorial shows how to quickly construct a streaming ETL pipeline that captures changes from MySQL and PostgreSQL using Flink CDC, enriches order data with product and shipment information, and writes the results into Elasticsearch for real‑time visualization in Kibana.

CDCDockerElasticsearch

0 likes · 11 min read

Building a Real-Time MySQL and PostgreSQL Streaming ETL with Flink CDC

Big Data Technology & Architecture

Nov 30, 2021 · Big Data

Curated Learning Resources for Big Data and Data Engineering

This article compiles a comprehensive list of Chinese-language articles and tutorials covering big‑data technologies such as Flink, Spark, Hive, ClickHouse, data governance, and related interview preparation resources, providing a structured learning path for aspiring data engineers.

Big DataData GovernanceFlink

0 likes · 4 min read

Curated Learning Resources for Big Data and Data Engineering

Alibaba Cloud Developer

Nov 30, 2021 · Big Data

Scaling Real‑Time Data Warehousing for Double‑11: Flink + Hologres in Action

During the 2021 Double‑11 shopping festival, logistics provider DiSiFang upgraded its real‑time data warehouse with Flink and Hologres, enabling multi‑billion‑row joins, cutting costs by 50%, and delivering stable, low‑latency analytics that powered high‑frequency dashboards and improved overall delivery speed.

Big DataFlinkHologres

0 likes · 13 min read

Scaling Real‑Time Data Warehousing for Double‑11: Flink + Hologres in Action

StarRocks

Nov 26, 2021 · Big Data

How Autohome Achieved Sub‑Second Real‑Time Analytics with StarRocks

Autohome replaced Flink and Kylin with StarRocks to power sub‑second real‑time OLAP analytics, detailing data sources, pain points, benchmark comparisons against Apache Kylin, ClickHouse, Presto, Spark, and Doris, integration with Flink‑connector, broker‑load scripts, monitoring setup, and lessons learned from large‑scale deployments.

FlinkOLAPStarRocks

0 likes · 12 min read

How Autohome Achieved Sub‑Second Real‑Time Analytics with StarRocks

HomeTech

Nov 24, 2021 · Databases

Real‑Time Data Analysis at AutoHome: Evaluation and Adoption of StarRocks

This article describes AutoHome's real‑time data analysis architecture, the challenges of existing OLAP solutions, the reasons for choosing StarRocks, detailed performance comparisons with Kylin, ClickHouse, Doris, Presto and Spark, and the practical integration of StarRocks with Flink, broker‑load scripts, and monitoring tools.

FlinkOLAPReal-time analytics

0 likes · 9 min read

Real‑Time Data Analysis at AutoHome: Evaluation and Adoption of StarRocks

Big Data Technology Architecture

Nov 23, 2021 · Big Data

Step-by-Step Guide to Setting Up Flink CDC with MySQL, Hudi, and Hive Integration on a Hadoop Cluster

This comprehensive tutorial walks through configuring a Hadoop‑based environment (Flink 1.13.1, Scala 2.11, CDH 6.2.0, Hive 2.1.1, Hudi 0.10), compiling Hudi, setting up Flink and MySQL binlog, creating CDC source and Hudi sink tables, running Flink jobs, and synchronizing the results to Hive partitions for query via Hive and Presto.

CDCFlinkHudi

0 likes · 15 min read

Step-by-Step Guide to Setting Up Flink CDC with MySQL, Hudi, and Hive Integration on a Hadoop Cluster

Alibaba Cloud Developer

Nov 22, 2021 · Big Data

How Flink’s Sort‑Shuffle Boosts Large‑Scale Batch Processing Performance

This article explains how Flink’s new Sort‑Shuffle mechanism improves stability and performance for massive batch jobs by reducing file counts, optimizing I/O, minimizing memory usage, and providing detailed implementation, test results, tuning tips, and future enhancements.

Batch ProcessingData ShuffleFlink

0 likes · 17 min read

How Flink’s Sort‑Shuffle Boosts Large‑Scale Batch Processing Performance

Big Data Technology & Architecture

Nov 22, 2021 · Big Data

Comprehensive Big Data Learning Path and Resource Guide

This article presents a detailed learning roadmap for aspiring big‑data experts, covering foundational programming languages, data structures, Linux basics, databases, distributed system theory, and essential frameworks such as Hadoop, Spark, Flink, Kafka, and provides curated B‑site video links and reference materials.

Big DataFlinkHadoop

0 likes · 9 min read

Comprehensive Big Data Learning Path and Resource Guide

Alibaba Cloud Developer

Nov 22, 2021 · Big Data

Achieving Exactly-Once Writes from Flink to ClickHouse: Architecture and Performance

This article explains how Flink and ClickHouse can be combined to build a real-time data warehouse with end-to-end Exactly-Once guarantees, detailing the underlying write mechanisms, transaction state machine, connector implementation, and performance test results, while also outlining future enhancements for distributed transactions.

FlinkPerformance Testingclickhouse

0 likes · 15 min read

Achieving Exactly-Once Writes from Flink to ClickHouse: Architecture and Performance

Big Data Technology & Architecture

Nov 20, 2021 · Big Data

Comprehensive Overview of Apache Flink Concepts, Mechanisms, and Interview Questions

This article provides an extensive technical guide to Apache Flink, covering its exactly‑once consumption guarantees, checkpoint and two‑phase commit mechanisms, differences from Spark, state backends, watermark handling, time semantics, window joins, CEP, backpressure, architecture layers, deployment, resource management, and common operational issues.

Big DataCEPCheckpoint

0 likes · 77 min read

Comprehensive Overview of Apache Flink Concepts, Mechanisms, and Interview Questions

Tencent Cloud Developer

Nov 19, 2021 · Artificial Intelligence

End‑to‑End Breast Cancer Prediction Solution Using Decision Tree on Tencent Cloud AI Platform

This guide details an end‑to‑end breast‑cancer prediction pipeline on Tencent Cloud, covering offline decision‑tree training with TI‑ONE, model packaging as a PMML service, real‑time feature generation via Oceanus and CKafka, and live inference stored in ClickHouse, all within a secure VPC.

AIFlinkReal-time Streaming

0 likes · 19 min read

End‑to‑End Breast Cancer Prediction Solution Using Decision Tree on Tencent Cloud AI Platform

HomeTech

Nov 17, 2021 · Big Data

Lakehouse Architecture Practice with Flink and Iceberg: Real‑time Data Ingestion and Management

This article details a lakehouse architecture built on Flink and Iceberg that addresses Hive‑based warehouse limitations by enabling ACID transactions, incremental snapshots, stream‑batch unification, CDC support, and various operational optimizations, ultimately achieving near real‑time data ingestion and analytics.

CDCFlinkIceberg

0 likes · 10 min read

Lakehouse Architecture Practice with Flink and Iceberg: Real‑time Data Ingestion and Management

Yiche Technology

Nov 17, 2021 · Databases

TiDB Architecture and Performance Optimization for Yiche’s 818 Car Carnival Data Dashboard

This article presents a technical case study of Yiche’s 818 Car Carnival data dashboard, detailing the background, business requirements, TiDB selection and architecture, encountered issues with TiDB, TiCDC and query performance, and the solutions and performance results achieved.

Database ArchitectureFlinkTiCDC

0 likes · 13 min read

TiDB Architecture and Performance Optimization for Yiche’s 818 Car Carnival Data Dashboard

Big Data Technology & Architecture

Nov 16, 2021 · Big Data

Flink Checkpoint, Backpressure, and Memory Tuning Guide

This article provides a comprehensive guide on optimizing Flink checkpoints, diagnosing and alleviating backpressure, and fine‑tuning memory configurations—including process, heap, off‑heap, managed, and network memory—to improve job stability and performance in large‑scale streaming applications.

CheckpointFlinkMemory Tuning

0 likes · 25 min read

Flink Checkpoint, Backpressure, and Memory Tuning Guide

Big Data Technology Architecture

Nov 15, 2021 · Big Data

Flink Sort‑Shuffle: Design, Implementation, and Performance Evaluation

This article explains how Flink's new sort‑shuffle mechanism improves large‑scale batch processing by reducing file counts, optimizing I/O, lowering memory usage, and delivering up to tenfold speedups, while also detailing configuration tips and future enhancements.

Batch ProcessingData ShuffleFlink

0 likes · 16 min read

Flink Sort‑Shuffle: Design, Implementation, and Performance Evaluation

Big Data Technology & Architecture

Nov 8, 2021 · Big Data

Why Choose Apache Iceberg? Tencent’s Optimizations and Real‑World Practices

This article examines the strengths and weaknesses of Apache Iceberg, explains why Tencent selected it over alternatives, details Tencent’s own enhancements and integration with Flink, Spark, and other engines, and shares multiple real‑world implementations for building enterprise‑grade real‑time data lakes.

Apache IcebergData LakeFlink

0 likes · 17 min read

Why Choose Apache Iceberg? Tencent’s Optimizations and Real‑World Practices

Big Data Technology & Architecture

Nov 8, 2021 · Big Data

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

This article introduces Flink CDC 2.0, explains its distributed full‑load and incremental reading mechanisms, details the slice partitioning, snapshot correction, and binlog handling logic, and provides a complete Java example that demonstrates how to configure Flink SQL, MySQL source, and Kafka sink.

Big DataCDCData Integration

0 likes · 29 min read

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

DataFunTalk

Nov 6, 2021 · Big Data

Evolution and Practices of OLAP at Vipshop: Presto, ClickHouse, and Kylin

This article details Vipshop's OLAP evolution, covering the deployment, optimization, and containerization of Presto, ClickHouse, and Kylin, the challenges faced, self‑developed tooling, and future directions for intelligent scaling and resource management.

Big DataFlinkKubernetes

0 likes · 27 min read

Evolution and Practices of OLAP at Vipshop: Presto, ClickHouse, and Kylin

DataFunTalk

Nov 6, 2021 · Artificial Intelligence

Elastic Federated Learning Solution (EFLS): Project Overview, Architecture, and Technical Implementation

The article introduces Alibaba's Elastic Federated Learning Solution (EFLS), describing its business motivations, core functionalities, system architecture, sample‑set intersection, federated training pipeline, novel algorithms, product console, and future roadmap for privacy‑preserving advertising in large‑scale sparse scenarios.

AdvertisingDistributed SystemsFederated Learning

0 likes · 18 min read

Elastic Federated Learning Solution (EFLS): Project Overview, Architecture, and Technical Implementation

Big Data Technology & Architecture

Nov 4, 2021 · Big Data

Understanding Flink State, Checkpoints, Savepoints, and Fault Tolerance

This article explains Flink's state concepts, the distinction between keyed and operator state, available state backends, TTL configuration, the mechanics of checkpoints and savepoints, and the two‑phase commit protocol for ensuring exactly‑once processing in streaming applications.

CheckpointsFlinkSavepoints

0 likes · 21 min read

Understanding Flink State, Checkpoints, Savepoints, and Fault Tolerance

HomeTech

Nov 3, 2021 · Big Data

Real‑time Materialized View Practices with Apache Flink: System Analysis, Algorithm Design, and Implementation

This article presents Car Home's experience building a real‑time materialized view system on Apache Flink, detailing system analysis, problem decomposition, a global‑version‑based CDC algorithm, its implementation as a Flink connector, practical deployment results, and remaining challenges such as clock dependency and state size.

CDCFlinkalgorithm

0 likes · 17 min read

Real‑time Materialized View Practices with Apache Flink: System Analysis, Algorithm Design, and Implementation

Big Data Technology & Architecture

Oct 29, 2021 · Big Data

Dimension Table Join Strategies in Apache Flink: Preload, Distributed Cache, Hot Storage, Broadcast, and Temporal Table Function

The article explains various dimension‑table join approaches in Apache Flink, including preloading tables into memory, using distributed cache, leveraging hot storage with async I/O, broadcasting state, and temporal table function joins, and compares their trade‑offs for different data volumes and update frequencies.

Dimension TableFlinkJOIN

0 likes · 10 min read

Dimension Table Join Strategies in Apache Flink: Preload, Distributed Cache, Hot Storage, Broadcast, and Temporal Table Function

Alimama Tech

Oct 27, 2021 · Artificial Intelligence

Elastic Federated Learning Solution (EFLS): Architecture, Core Functions, and Technical Details

The Elastic Federated Learning Solution (EFLS) is Alibaba’s open‑source platform that enables privacy‑preserving vertical and horizontal federated learning for large‑scale sparse advertising, offering data‑intersection, high‑performance C++ training, a visual console, novel aggregation algorithms, and a roadmap toward multi‑party scaling and advanced encryption.

AdvertisingElastic Federated LearningFlink

0 likes · 16 min read

Elastic Federated Learning Solution (EFLS): Architecture, Core Functions, and Technical Details

Big Data Technology & Architecture

Oct 26, 2021 · Big Data

Practical Experience Building a Real‑Time Clickstream Data Warehouse with Flink and ClickHouse

This article shares practical insights on designing and operating a real‑time clickstream data warehouse using Flink for streaming processing and ClickHouse for near‑real‑time OLAP, covering dimensional modeling, layered architecture, Flink‑ClickHouse sink implementation, and data rebalancing strategies.

FlinkReal-time analyticsStreaming

0 likes · 10 min read

Practical Experience Building a Real‑Time Clickstream Data Warehouse with Flink and ClickHouse

Kuaishou Big Data

Oct 21, 2021 · Big Data

How Kuaishou Boosted Data Efficiency with Apache Hudi: Real‑Time + Offline Solutions

This article explains how Kuaishou tackled late data scheduling, costly synchronization, and inefficient back‑fills by adopting Apache Hudi, detailing the pain points, reasons for choosing Hudi, and step‑by‑step implementation to achieve fast, fresh, and scalable data processing.

Data LakeFlinkHudi

0 likes · 13 min read

How Kuaishou Boosted Data Efficiency with Apache Hudi: Real‑Time + Offline Solutions

Tencent Cloud Developer

Oct 21, 2021 · Big Data

Real-Time UV and PV Analytics with Flink SQL on Tencent Cloud Oceanus

This guide shows how to build a real‑time UV and PV analytics pipeline on Tencent Cloud Oceanus by connecting a self‑hosted Kafka cluster to Flink SQL, using Redis for deduplicated visitor counts, page view logs, and conversion‑rate calculations via hop windows.

FlinkKafkaOceanus

0 likes · 11 min read

Real-Time UV and PV Analytics with Flink SQL on Tencent Cloud Oceanus

Big Data Technology & Architecture

Oct 19, 2021 · Big Data

Understanding Top‑N Optimization in Flink SQL: Logical and Physical Plans

This article explains how Flink SQL implements Top‑N queries, shows the standard SQL syntax, analyzes the logical and physical execution plans generated by the optimizer, and details the internal Rank node, optimization rules, state handling, and configuration options for efficient stream processing.

FlinkLogical PlanPhysical Plan

0 likes · 8 min read

Understanding Top‑N Optimization in Flink SQL: Logical and Physical Plans

DataFunTalk

Oct 18, 2021 · Big Data

Building an Intelligent Data Warehouse at Yixin Group: A Big Data Platform Case Study

The article describes how Yixin Group’s product team created an in‑house intelligent data warehouse using Hadoop, Flink/Spark, and standardized data services to transform scattered automotive‑finance data into a secure, scalable platform that supports real‑time analytics and drives business growth.

Big DataFlinkHadoop

0 likes · 10 min read

Building an Intelligent Data Warehouse at Yixin Group: A Big Data Platform Case Study

Big Data Technology & Architecture

Oct 12, 2021 · Big Data

Data Lake Evolution and a Practical Flink + Iceberg Implementation Guide

This article explores the evolution of data lakes, compares major cloud providers' lake architectures, introduces the emerging lakehouse concept, and provides a step‑by‑step Flink‑Iceberg implementation—including dependencies, catalog setup, table creation, checkpointing, and Kafka ingestion—demonstrating practical big‑data streaming solutions.

Data LakeFlinkIceberg

0 likes · 14 min read

Data Lake Evolution and a Practical Flink + Iceberg Implementation Guide

DataFunTalk

Oct 6, 2021 · Big Data

Optimizing Flink Real‑Time Computing at Bilibili: Connector Stability, SQL, Runtime, and Future Outlook

This article details Bilibili's comprehensive optimization of Flink real‑time computing, covering connector stability improvements, SQL interval‑join enhancements, runtime state and checkpoint refinements, a diagnostic tool, and future directions for high‑throughput streaming workloads.

Big DataCheckpointFlink

0 likes · 18 min read

Optimizing Flink Real‑Time Computing at Bilibili: Connector Stability, SQL, Runtime, and Future Outlook

GrowingIO Tech Team

Sep 23, 2021 · Big Data

How to Build a Real‑Time Flink Metrics Dashboard with Prometheus & Grafana

This article explains how to monitor Flink jobs running on YARN by leveraging Flink metrics, configuring reporters, defining custom metrics, and visualizing the data in real time with Prometheus, Grafana, and Graphite‑exporter, complete with deployment diagrams and code examples.

Big DataFlinkGrafana

0 likes · 9 min read

How to Build a Real‑Time Flink Metrics Dashboard with Prometheus & Grafana

Java Architect Essentials

Sep 21, 2021 · Big Data

Interview on Kuaishou's Billion‑Scale Big Data Architecture Evolution and Practices

The interview with Kuaishou senior architect Zhao Jianbo details the three‑phase evolution of its trillion‑scale big data platform, covering foundational Hadoop services, real‑time and OLAP extensions, deep customizations, Spring Festival Gala challenges, scheduling innovations, Hadoop usage, and the relationship between big data and cloud architectures.

Big DataFlinkHadoop

0 likes · 19 min read

Interview on Kuaishou's Billion‑Scale Big Data Architecture Evolution and Practices

NetEase Game Operations Platform

Sep 18, 2021 · Big Data

StreamflySQL: NetEase Games’ Journey from Template JAR to SQL Gateway for Flink SQL Platformization

This article details NetEase Games’ evolution of its Flink SQL platform, from the early StreamflySQL v1 template‑JAR approach to the v2 SQL‑Gateway architecture, discussing design decisions, challenges such as metadata persistence, multi‑tenant security, horizontal scaling, and job state management.

FlinkReal-time analyticsplatform engineering

0 likes · 17 min read

StreamflySQL: NetEase Games’ Journey from Template JAR to SQL Gateway for Flink SQL Platformization

Big Data Technology Architecture

Sep 17, 2021 · Big Data

Real‑time Computing Platform Architecture, Flink Migration, and One‑stop Platform at 58.com

This article details the design and implementation of 58.com’s real‑time computing platform, covering its architecture, data ingestion, storage, Flink‑based stream processing, SQL extensions, performance optimizations, Storm‑to‑Flink migration tools, the Wstream management console, state handling, monitoring, and future roadmap.

Data PlatformFlinkReal‑Time Computing

0 likes · 16 min read

Real‑time Computing Platform Architecture, Flink Migration, and One‑stop Platform at 58.com

Big Data Technology & Architecture

Sep 13, 2021 · Big Data

Understanding Bytecode, Code Generation, Serialization, and Data Processing Techniques in Spark and Flink

This article explains how bytecode and code‑generation improve Spark SQL performance, compares Java I/O and MapReduce InputFormats, reviews serialization choices in Spark and Flink, and describes reflection‑based DataFrame creation, storage‑memory eviction, fail‑fast design, and ConcurrentHashMap usage in big‑data frameworks.

FlinkSparkcode-generation

0 likes · 11 min read

Understanding Bytecode, Code Generation, Serialization, and Data Processing Techniques in Spark and Flink

Big Data Technology & Architecture

Sep 11, 2021 · Big Data

Deep Dive into Flink Table & SQL Window Functions, UDFs, and Hive Integration

This article provides a comprehensive guide to Flink Table and SQL window semantics—including group, tumbling, sliding, and session windows—covers over windows, demonstrates how to define windows in SQL, explains built‑in functions, shows how to implement scalar, table, aggregate and table‑aggregate UDFs, and details Flink's integration with Hive, complete with Maven dependencies and runnable examples.

FlinkHive IntegrationTable API

0 likes · 27 min read

Deep Dive into Flink Table & SQL Window Functions, UDFs, and Hive Integration

Xueersi Online School Tech Team

Sep 10, 2021 · Big Data

Real‑time OLAP with Flink and Hologres: Replacing Lambda/Kappa Architectures

This article analyzes the limitations of traditional Lambda and Kappa big‑data architectures for online‑school behavior‑feature pipelines and presents a Flink + Hologres solution that provides unified real‑time OLAP and high‑concurrency point‑query services, including design choices, implementation details, and performance results.

FlinkHologresKappa architecture

0 likes · 12 min read

Real‑time OLAP with Flink and Hologres: Replacing Lambda/Kappa Architectures

Big Data Technology & Architecture

Sep 10, 2021 · Big Data

Understanding Flink Table API and SQL: Dependencies, Planners, and Practical Usage

This article provides a comprehensive guide to Apache Flink's Table API and SQL, covering required dependencies, the differences between old and Blink planners, program structure, table environment creation, catalog registration, query execution, conversion between DataStream and Table, update modes, and time attribute handling, with Scala code examples throughout.

FlinkScalaStreaming

0 likes · 26 min read

Understanding Flink Table API and SQL: Dependencies, Planners, and Practical Usage

Big Data Technology & Architecture

Sep 8, 2021 · Big Data

Understanding Flink's Memory Model: On‑Heap, Off‑Heap, and Memory Management

This article explains Flink's memory architecture, covering on‑heap and off‑heap memory concepts, garbage collection, allocation strategies, memory segments, buffers, the memory manager, and how network transmission and back‑pressure are handled to achieve efficient streaming processing.

FlinkMemory ManagementOff-Heap

0 likes · 20 min read

Understanding Flink's Memory Model: On‑Heap, Off‑Heap, and Memory Management

Big Data Technology & Architecture

Sep 6, 2021 · Big Data

Comprehensive Guide to Flink Join Operations: Interval Join, Window Join, Broadcast, and Temporal Table Function

This article explains Flink's various join mechanisms—including interval‑based joins, window‑based joins, streaming SQL joins, and dimension‑table joins such as preload, hot‑storage, broadcast, and temporal‑table function—provides detailed code examples in Java, discusses state management and performance considerations, and summarizes the four main dimension‑table join patterns.

Broadcast StateFlinkJOIN

0 likes · 32 min read

Comprehensive Guide to Flink Join Operations: Interval Join, Window Join, Broadcast, and Temporal Table Function

Big Data Technology & Architecture

Sep 2, 2021 · Big Data

Understanding Network Flow Control and Flink's Backpressure Mechanisms

This article explains the concepts and background of network flow control, compares static rate limiting with dynamic feedback backpressure, describes TCP's sliding‑window mechanism, and details how Flink implements both TCP‑based and credit‑based backpressure to handle mismatched upstream‑downstream speeds in streaming applications.

Credit-basedFlinkNetwork Flow Control

0 likes · 16 min read

Understanding Network Flow Control and Flink's Backpressure Mechanisms

ByteDance ADFE Team

Aug 31, 2021 · Big Data

Evolution of the Big Data Technology Stack Over the Past Five Years

This article reviews the evolution of big data technologies in the last five years, covering streaming and batch processing frameworks, column‑store NoSQL databases, programming language trends, the cloud‑native multi‑model database Lindorm, and practical Flink/Blink usage with code examples.

Big DataFlinkLindorm

0 likes · 24 min read

Evolution of the Big Data Technology Stack Over the Past Five Years

Big Data Technology Architecture

Aug 31, 2021 · Big Data

Real-time CDC Data Read/Write Solutions in Data Lake Architecture with Flink and Iceberg

This article, compiled by community volunteers, examines various CDC data real‑time read/write solutions for data lake architectures, comparing offline HBase, Apache Kudu, Hive, Spark + Delta, and ultimately advocating Flink + Iceberg for efficient, correct, and scalable streaming ingestion and analytics.

CDCFlinkIceberg

0 likes · 18 min read

Real-time CDC Data Read/Write Solutions in Data Lake Architecture with Flink and Iceberg

Meituan Technology Team

Aug 26, 2021 · Big Data

How Meituan Built a Scalable Real‑Time Data Warehouse: Architecture & Lessons

Meituan Waimai’s data intelligence team outlines a universal real‑time data‑warehouse methodology that combines a production platform with an interactive analytics engine, detailing scenarios, technology choices, architectural designs, platformization, SLA management, and a practical Lambda‑style case study.

FlinkKappa architectureLambda architecture

0 likes · 18 min read

How Meituan Built a Scalable Real‑Time Data Warehouse: Architecture & Lessons

Big Data Technology & Architecture

Aug 24, 2021 · Big Data

Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake

This article provides an in-depth overview of data lake concepts, definitions, and essential features, followed by detailed case studies of enterprise data lake implementations and comparative analysis of leading data lake table formats—Iceberg, Hudi, and Delta Lake—highlighting their architectures, capabilities, and trade‑offs.

Data LakeDelta LakeFlink

0 likes · 19 min read

Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake

Python Crawling & Data Mining

Aug 21, 2021 · Big Data

Understanding Flink’s Architecture: From APIs to Cluster Deployment

This article explains Flink’s three‑layer architecture (APIs & Libraries, Core, Deploy), details its programming interfaces, runtime engine, deployment options, and core concepts such as stateful computation and time semantics, providing a comprehensive guide for building robust stream and batch applications.

FlinkStateful ComputingTime Semantics

0 likes · 13 min read

Understanding Flink’s Architecture: From APIs to Cluster Deployment

DataFunSummit

Aug 15, 2021 · Big Data

Building a General Real-Time Data Warehouse: Methods and Practices at Meituan Waimai

This article introduces a universal method for building a real-time data warehouse at Meituan Waimai, covering streaming technologies, architecture choices such as Lambda and Kappa, component design, feature production, SLA management, and practical OLAP solutions using Flink, Storm, and Doris.

FlinkKappa architectureLambda architecture

0 likes · 15 min read

Building a General Real-Time Data Warehouse: Methods and Practices at Meituan Waimai

ITFLY8 Architecture Home

Aug 3, 2021 · Big Data

How BIGO Scaled Real‑Time Messaging by Migrating from Kafka to Pulsar

BIGO replaced its Kafka‑based message‑flow platform with Apache Pulsar to overcome scaling, stability, and operational cost challenges, leveraging Pulsar’s storage‑compute separation, seamless horizontal expansion, low latency, and tight integration with Flink for real‑time ETL and AB‑test pipelines, resulting in billions of messages processed daily with half the hardware cost.

Apache PulsarETLFlink

0 likes · 17 min read

How BIGO Scaled Real‑Time Messaging by Migrating from Kafka to Pulsar

Big Data Technology & Architecture

Aug 2, 2021 · Big Data

Comprehensive Big Data Interview Question Guide for Major Tech Companies

This article compiles extensive interview questions and topics covering Hadoop, Spark, Flink, Hive, Kafka, MySQL, Redis, Java fundamentals, and algorithms, organized by companies such as Xiaomi, ByteDance, Alibaba, Shopee, Tencent, Meituan, NetEase, and Baidu, to help candidates prepare effectively for big‑data engineering roles.

Big DataFlinkHadoop

0 likes · 22 min read

Comprehensive Big Data Interview Question Guide for Major Tech Companies

DataFunTalk

Jul 29, 2021 · Big Data

Real-Time Data Warehouse Construction at TAL Using DorisDB

This article details TAL's transition from offline to real-time data warehousing, describing business drivers, pain points, architectural evolution through Hive, Flink+Kudu, and DorisDB, and outlining the system design, data flow, scheduling, monitoring, and the resulting business and cost benefits.

AirflowBig DataDorisDB

0 likes · 14 min read

Real-Time Data Warehouse Construction at TAL Using DorisDB

DataFunTalk

Jul 28, 2021 · Big Data

Pravega Flink Connector: Past, Present, and Future – Architecture, Checkpoint Integration, and Upcoming Features

This article reviews the Pravega project and its Flink connector, covering Pravega's design for large‑scale streaming, the connector's evolution and exact‑once semantics, Flink 1.11 integration challenges, checkpoint mechanisms, and future plans such as schema‑registry and new Flink features.

Big DataCheckpointConnector

0 likes · 10 min read

Pravega Flink Connector: Past, Present, and Future – Architecture, Checkpoint Integration, and Upcoming Features

dbaplus Community

Jul 27, 2021 · Big Data

How JD Built a Millisecond‑Scale Real‑Time Browsing Record System for 500M Users

This article details JD's end‑to‑end design of a real‑time browsing record platform that captures, stores, and queries up to 200 recent items per user with millisecond latency, covering architecture, hot‑cold data separation, microservice APIs, and streaming pipelines using Kafka, Flink, Jimdb, and HBase.

FlinkHBaseJimdb

0 likes · 13 min read

How JD Built a Millisecond‑Scale Real‑Time Browsing Record System for 500M Users

DataFunTalk

Jul 27, 2021 · Big Data

Building a Real‑Time Data Warehouse with Apache Doris at Shuhai Supply Chain

This article describes how Shuhai Supply Chain upgraded its data warehouse from a complex, high‑cost 1.0 architecture to a streamlined, real‑time solution built around Apache Doris, detailing the motivations, design choices, zero‑code ingestion, metadata management, Flink connector, and the resulting performance gains.

Apache DorisBig DataFlink

0 likes · 13 min read

Building a Real‑Time Data Warehouse with Apache Doris at Shuhai Supply Chain

DataFunTalk

Jul 26, 2021 · Big Data

Accelerating Hive Daily Tables with Flink: A SmartNews Case Study

This article describes how SmartNews integrated Flink into its Airflow‑driven Hive batch pipeline to cut the actions table generation latency from three hours to about thirty‑four minutes, detailing the technical challenges, design decisions, and production results.

AWSBig DataFlink

0 likes · 12 min read

Accelerating Hive Daily Tables with Flink: A SmartNews Case Study

Big Data Technology & Architecture

Jul 20, 2021 · Big Data

Common Issues and Solutions for Flink CDC with MySQL

This article summarizes frequent problems encountered when using Flink CDC with MySQL—including Kafka version conflicts, checkpoint timeouts, permission errors, global lock issues, and DDL parsing failures—and provides practical configuration tweaks and code examples to resolve them.

CDCCheckpointDebezium

0 likes · 11 min read

Common Issues and Solutions for Flink CDC with MySQL

Big Data Technology Architecture

Jul 20, 2021 · Big Data

PB‑Level Ad‑hoc Query Practice with Flink: Threat Hunting Platform Architecture and IO‑Reducing Optimizations

This article details 360's Threat Hunting platform built on Flink, covering its evolution, architecture, block‑index design, Hilbert‑curve data ordering, like‑pushdown, join optimizations, Alluxio caching, and future plans for BI and multi‑user concurrency, all aimed at efficient PB‑scale data querying.

AlluxioBlock IndexFlink

0 likes · 18 min read

PB‑Level Ad‑hoc Query Practice with Flink: Threat Hunting Platform Architecture and IO‑Reducing Optimizations

Big Data Technology Architecture

Jul 15, 2021 · Big Data

Building Data Lake Solutions with Iceberg and Object Storage: Architecture, Write/Read Processes, and Storage Optimization

This article presents a comprehensive overview of using Apache Iceberg with object storage to construct scalable data lake solutions, covering lake architecture, Iceberg table organization, Flink‑based write and read workflows, catalog abstractions, object storage versus HDFS comparisons, append‑upload and atomic‑commit challenges, a demonstration setup, and ideas for storage optimization.

CatalogFlinkIceberg

0 likes · 16 min read

Building Data Lake Solutions with Iceberg and Object Storage: Architecture, Write/Read Processes, and Storage Optimization

Big Data Technology & Architecture

Jul 12, 2021 · Big Data

Common Production Issues and Troubleshooting Guide for Apache Flink

This article compiles classic production problems encountered with Apache Flink, covering cluster sizing, checkpoint failures, backpressure diagnosis, client submission errors, resource allocation on YARN, and PyFlink UDF definitions, providing step‑by‑step troubleshooting methods and practical recommendations.

CheckpointFlinkbackpressure

0 likes · 18 min read

Common Production Issues and Troubleshooting Guide for Apache Flink

DataFunTalk

Jul 10, 2021 · Big Data

Building a Lakehouse Architecture with Apache Iceberg and Flink: Practices and Insights

This article explains how to construct a lake‑house architecture using Apache Iceberg, detailing the migration from Hive, Flink‑SQL integration, proxy user support, CDC handling, copy‑on‑write sinks, and the resulting benefits for near‑real‑time data visibility and unified batch‑stream processing.

Apache IcebergCDCFlink

0 likes · 10 min read

Building a Lakehouse Architecture with Apache Iceberg and Flink: Practices and Insights

Big Data Technology & Architecture

Jul 10, 2021 · Big Data

Comprehensive Big Data Learning Path and Interview Knowledge Map

This extensive guide outlines a modern big‑data learning roadmap, covering essential programming languages, Linux, databases, distributed system theory, networking, offline and real‑time computation, message queues, data warehouses, algorithms, backend skills, interview preparation, and practical advice for building a personal knowledge system.

FlinkHadoopLearning Path

0 likes · 24 min read

Comprehensive Big Data Learning Path and Interview Knowledge Map

Big Data Technology & Architecture

Jul 8, 2021 · Big Data

Using Flink CDC to Write Data into Apache Hudi and Query with Hive and Spark SQL

This guide walks through preparing the environment, creating a MySQL source table, configuring Flink CDC to ingest data into an Apache Hudi table, and then querying the Hudi data using both Hive and Spark‑SQL, including handling of partitions, realtime input formats, and required configuration settings.

CDCDataPipelineFlink

0 likes · 10 min read

Using Flink CDC to Write Data into Apache Hudi and Query with Hive and Spark SQL

JD Retail Technology

Jul 5, 2021 · Backend Development

Design and Implementation of JD's Real-Time Browsing Record System

The article describes JD's real-time browsing record system architecture, detailing its four modules—storage, query, real-time reporting, and offline reporting—along with hot‑cold data separation, use of Jimdb, HBase, Kafka, and Flink to achieve millisecond‑level latency and high throughput for billions of user records.

BrowsingFlinkHBase

0 likes · 12 min read

Design and Implementation of JD's Real-Time Browsing Record System

37 Mobile Game Tech Team

Jul 2, 2021 · Operations

How to Build a Flink Monitoring System with Prometheus, Pushgateway, and Grafana

This guide walks you through configuring Flink metrics, installing and linking Pushgateway, Node_exporter, Prometheus, and Grafana, and finally visualizing and alerting on Flink metrics, providing a complete end‑to‑end monitoring solution for Flink clusters.

FlinkGrafanaMetrics

0 likes · 7 min read

How to Build a Flink Monitoring System with Prometheus, Pushgateway, and Grafana

37 Mobile Game Tech Team

Jul 2, 2021 · Big Data

Inside Flink Metrics: Adding, Retrieving, and Exposing Metrics in TaskManager

This article walks through Flink's metric system by explaining the core interfaces such as MetricReporter and MetricRegistry, showing how metrics are added, registered, and queried during TaskManager startup, and detailing both REST and Prometheus approaches for retrieving metric values.

Big DataFlinkMetrics

0 likes · 16 min read

Inside Flink Metrics: Adding, Retrieving, and Exposing Metrics in TaskManager

Youzan Coder

Jun 30, 2021 · Big Data

Online Monitoring Practices for Offline and Real-Time Data at Youzan

Youzan Data Report Center monitors offline batch and real‑time data pipelines using accuracy and timeliness rules, cross‑table checks, upstream‑downstream comparisons, and scheduled alerts to detect anomalies early; since 2021 it has generated over 25 alerts, and plans a unified data‑quality dashboard.

Big DataData QualityFlink

0 likes · 12 min read

Online Monitoring Practices for Offline and Real-Time Data at Youzan

DataFunTalk

Jun 29, 2021 · Big Data

In-depth Analysis of Flink SQL 1.13 Features and Improvements

This article provides a comprehensive overview of Apache Flink SQL 1.13, detailing new Window TVF support, cumulate windows, performance optimizations, time‑zone handling, enhanced Hive compatibility, SQL client upgrades, DataStream‑Table conversion improvements, and outlines the roadmap for the upcoming 1.14 release.

DataStreamFlinkHive Integration

0 likes · 15 min read

In-depth Analysis of Flink SQL 1.13 Features and Improvements

DataFunTalk

Jun 21, 2021 · Big Data

Flink + Iceberg 0.11 Practices in Qunar Data Platform

This article shares Qunar's experience using Flink together with Apache Iceberg 0.11 to address real‑time data warehouse challenges, covering background pain points, Iceberg architecture, solutions for Kafka data loss and Hive latency, and optimization practices such as small‑file handling, sorting, and checkpoint management.

Big DataData LakeFlink

0 likes · 13 min read

Flink + Iceberg 0.11 Practices in Qunar Data Platform

Qunar Tech Salon

Jun 21, 2021 · Big Data

Using Apache Iceberg 0.11 with Flink for Real‑time Data Lake: Architecture, Pain Points, and Solutions

This article examines the challenges of using Kafka, Flink, and Hive for real‑time data warehousing, introduces Apache Iceberg 0.11 as a solution, details its architecture, query planning, Flink integration, code examples, optimization techniques, and summarizes the benefits for large‑scale data processing.

Big DataData LakeFlink

0 likes · 12 min read

Using Apache Iceberg 0.11 with Flink for Real‑time Data Lake: Architecture, Pain Points, and Solutions

DataFunTalk

Jun 20, 2021 · Databases

Xiaohongshu’s OLAP Architecture Evolution and DorisDB Adoption

This article details Xiaohongshu’s multi‑stage evolution of its OLAP infrastructure—from Redshift to Presto, ClickHouse, and finally DorisDB—describing the data pipeline, tool comparisons, advertising use‑case implementation, and the resulting performance and operational benefits.

Big DataDorisDBFlink

0 likes · 12 min read

Xiaohongshu’s OLAP Architecture Evolution and DorisDB Adoption

NetEase Smart Enterprise Tech+

Jun 17, 2021 · Big Data

Building a Real‑Time Service Monitoring Framework with Flink at NetEase Cloud

This article explains how NetEase Cloud Communication designed and implemented a Flink‑based streaming aggregation framework that processes massive heartbeat logs in real time, handles data skew with two‑stage aggregation, and outputs metrics to Kafka and InfluxDB for monitoring and alerting.

Data SkewFlinkMetric Computation

0 likes · 11 min read

Building a Real‑Time Service Monitoring Framework with Flink at NetEase Cloud

Big Data Technology & Architecture

Jun 16, 2021 · Big Data

Practical Experience and Optimizations of Apache Iceberg in Tencent’s Big Data Ecosystem

This article reviews the advantages of Apache Iceberg for data lake storage, details Tencent’s custom optimizations and integration with Flink and Spark, and shares multiple real‑world implementations that demonstrate how Iceberg improves data consistency, reduces small‑file overhead, and enables near‑real‑time analytics in large‑scale big‑data environments.

Apache IcebergData LakeFlink

0 likes · 18 min read

Practical Experience and Optimizations of Apache Iceberg in Tencent’s Big Data Ecosystem

Java High-Performance Architecture

Jun 14, 2021 · Big Data

How NetEase Games Built a Scalable Flink‑Based Streaming ETL Platform

This article explains how NetEase Games engineers designed and operated a Flink‑driven streaming ETL system, covering business background, log classification, dedicated and generic ETL services, architecture evolution, Python UDF integration, runtime optimizations, tuning practices, fault‑tolerance mechanisms, and future roadmap.

FlinkGame Analyticsdata pipeline

0 likes · 22 min read

How NetEase Games Built a Scalable Flink‑Based Streaming ETL Platform

Architecture Digest

Jun 10, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's streaming ETL solution built on Flink, covering business background, log characteristics, specialized and generic ETL services, architectural evolution, Python UDF integration, runtime optimizations, fault‑tolerance mechanisms, and future roadmap for unified real‑time and offline data warehouses.

Big DataFlinkLog Processing

0 likes · 19 min read

NetEase Game Streaming ETL Architecture and Practices Based on Flink

Sohu Tech Products

Jun 9, 2021 · Big Data

Real-time UV Counting with Flink, Hologres, and RoaringBitmap

This article explains how to implement both offline (T+1) and real‑time UV counting using Hologres with RoaringBitmap for high‑cardinality aggregation, and demonstrates a complete Flink‑Hologres pipeline—including table creation, streaming joins, windowed aggregation, and query examples—for fine‑grained user metric analysis.

FlinkHologresRoaringBitmap

0 likes · 11 min read

Real-time UV Counting with Flink, Hologres, and RoaringBitmap

DataFunTalk

Jun 6, 2021 · Big Data

Understanding Apache Pulsar: Cloud‑Native Messaging, Storage‑Compute Separation, and Batch‑Stream Fusion with Flink

This article explains Apache Pulsar’s cloud‑native, storage‑compute separated architecture, its data model and scalability features, and how it integrates with Flink to provide a unified platform for both real‑time streaming and batch processing in big‑data applications.

Apache PulsarBatch-Stream IntegrationBig Data

0 likes · 17 min read

Understanding Apache Pulsar: Cloud‑Native Messaging, Storage‑Compute Separation, and Batch‑Stream Fusion with Flink

IT Architects Alliance

Jun 5, 2021 · Big Data

How to Build a Real‑Time Recommendation System with Flink, HBase, and Docker

This article walks through a complete real‑time recommendation system built on Apache Flink, detailing its v2.0 architecture, modules for user behavior, interest, and product profiling, the recommendation algorithms (hot‑list, collaborative filtering, item similarity), and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka.

DockerFlinkHBase

0 likes · 11 min read

How to Build a Real‑Time Recommendation System with Flink, HBase, and Docker

ITFLY8 Architecture Home

Jun 3, 2021 · Big Data

Building a Real‑Time Flink Recommendation System: Architecture, Code & Deployment

This article walks through a complete Flink‑based recommendation system, detailing its v2.0 architecture, recommendation algorithms, front‑end and back‑end components, and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka services.

Big DataDockerFlink

0 likes · 10 min read

Building a Real‑Time Flink Recommendation System: Architecture, Code & Deployment

Big Data Technology & Architecture

Jun 1, 2021 · Big Data

Understanding Idle State Retention Time in Flink SQL

Flink SQL's idle state retention time feature prevents state explosion by automatically cleaning up state for keys that remain inactive beyond a configurable time window, requiring both minimum and maximum retention settings, with implementation details involving CleanupState, timers, and KeyedProcessFunctionWithCleanupState.

FlinkIdle State RetentionState Management

0 likes · 8 min read

Understanding Idle State Retention Time in Flink SQL

Big Data Technology Architecture

May 31, 2021 · Big Data

Practical Experience of Using Flink + Iceberg 0.11 on Qunar Data Platform

This article presents Qunar's practical experience with Flink and Iceberg 0.11, covering background challenges such as Kafka data loss and Hive metadata pressure, explaining Iceberg architecture, query planning, and detailed solutions including real‑time ingestion, small‑file handling, sorting, and code examples for seamless migration.

FlinkIcebergReal-time Processing

0 likes · 12 min read

Practical Experience of Using Flink + Iceberg 0.11 on Qunar Data Platform

IT Architects Alliance

May 30, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's Flink‑based streaming ETL system, detailing business background, log classifications, specialized and generic ETL services, Python UDF integration, runtime optimizations, HDFS write tuning, SLA metrics, fault‑tolerance mechanisms, and future roadmap for unified data lakes and PyFlink support.

Big DataData IntegrationETL

0 likes · 19 min read

dbaplus Community

May 27, 2021 · Big Data

How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink

This article details Vipshop's OLAP evolution, describing how Presto, Kylin, and ClickHouse are integrated, the deployment architecture with HAproxy and chproxy, containerization on Kubernetes, and the Flink‑ClickHouse pipeline that enables self‑service analysis of hundred‑billion‑row datasets while addressing performance challenges and future roadmap.

Big DataFlinkOLAP

0 likes · 28 min read

How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink

IT Architects Alliance

May 22, 2021 · Big Data

Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide

This article presents a comprehensive walkthrough of a Flink‑powered recommendation system, detailing its v2.0 architecture, module functions, recommendation algorithms (hotness, product similarity, collaborative filtering), front‑end and back‑end UI, and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka services.

Big DataDockerFlink

0 likes · 11 min read

Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide