Tagged articles
946 articles
Page 4 of 10
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 30, 2023 · Big Data

Apache Paimon (Incubating): A Streaming Lakehouse Storage Project Overview

Apache Paimon, newly incubated by the Apache Software Foundation, combines Flink's real‑time streaming capabilities with open lakehouse storage formats, offering high‑throughput, low‑latency data ingestion, partial‑update merges, and seamless integration with engines like Flink, Spark, and Trino for unified batch and streaming analytics.

Apache PaimonBig DataData Lake
0 likes · 7 min read
Apache Paimon (Incubating): A Streaming Lakehouse Storage Project Overview
ITPUB
ITPUB
Mar 28, 2023 · Big Data

How We Turned a Hive Data Warehouse into a Real‑Time Lakehouse with Apache Hudi

This article details the migration from a traditional Hive‑based data warehouse to a lakehouse architecture using Apache Hudi, covering the original Lambda setup, its pain points, lake‑vs‑warehouse differences, Hudi features, integration challenges, practical solutions, and future roadmap.

Apache HudiBig DataFlink
0 likes · 11 min read
How We Turned a Hive Data Warehouse into a Real‑Time Lakehouse with Apache Hudi
DataFunTalk
DataFunTalk
Mar 28, 2023 · Artificial Intelligence

FeatHub: An Open‑Source Feature Store for Real‑Time and Offline Feature Engineering

This article introduces FeatHub, an open‑source feature‑store project from Alibaba Cloud that provides a Python SDK, flexible architecture, and execution engines such as Flink and Spark to simplify the development, deployment, monitoring, and sharing of real‑time and offline machine‑learning features across multi‑cloud environments.

Feature StoreFlinkPython SDK
0 likes · 21 min read
FeatHub: An Open‑Source Feature Store for Real‑Time and Offline Feature Engineering
DataFunTalk
DataFunTalk
Mar 25, 2023 · Artificial Intelligence

ZhongAn Financial Real‑Time Feature Platform: MLOps Practices, Architecture and Anti‑Fraud Applications

This article presents ZhongAn Financial’s end‑to‑end MLOps workflow and real‑time feature platform architecture, detailing team roles, data pipelines, Flink‑based processing, TableStore storage, anti‑fraud feature design, and answers to common implementation questions, offering a comprehensive guide for building scalable, low‑latency ML services in finance.

FlinkMLOpsTablestore
0 likes · 25 min read
ZhongAn Financial Real‑Time Feature Platform: MLOps Practices, Architecture and Anti‑Fraud Applications
DeWu Technology
DeWu Technology
Mar 22, 2023 · Big Data

Analysis of Flink Scheduling Components and Slot Allocation

The article explains Flink’s post‑submission scheduling pipeline—from Dispatcher creating SchedulerNG and building the ExecutionGraph, through pipelined region construction and the PipelinedRegionSchedulingStrategy, to slot sharing allocation—identifying why slot and TaskManager overloads occur and proposing randomization or fine‑grained resource strategies to balance load.

DistributedSystemsExecutionGraphFlink
0 likes · 14 min read
Analysis of Flink Scheduling Components and Slot Allocation
ITPUB
ITPUB
Mar 13, 2023 · Big Data

What’s New in Apache Kyuubi 1.6.0? Server, Client, and Engine Enhancements

Apache Kyuubi 1.6.0 introduces major server‑side upgrades such as batch JAR task submission with RESTful APIs and a metadata store for HA, client‑side improvements including a unified JDBC driver and enhanced Beeline, plus mature Spark, Flink, Trino, and Hive engine plugins, while outlining the community’s roadmap.

Big DataEngine PluginsFlink
0 likes · 13 min read
What’s New in Apache Kyuubi 1.6.0? Server, Client, and Engine Enhancements
DataFunTalk
DataFunTalk
Mar 12, 2023 · Big Data

Apache Kyuubi 1.6.0 Feature Overview and Enhancements

The article provides a comprehensive walkthrough of Apache Kyuubi 1.6.0, detailing server‑side enhancements such as batch (JAR) task submission, metadata store and unified API/authentication, client‑side improvements to the built‑in JDBC driver and Beeline, as well as engine plugins for Spark, Flink, Trino and Hive, and concludes with the community’s roadmap and statistics.

Apache KyuubiBatch ProcessingBig Data
0 likes · 12 min read
Apache Kyuubi 1.6.0 Feature Overview and Enhancements
DataFunTalk
DataFunTalk
Mar 9, 2023 · Big Data

Real‑Time Data Platform Architecture and Cloud‑Native Flink Migration at Manbang

This article presents a comprehensive case study of Manbang's real‑time data platform, detailing its business background, cloud‑native Flink + Hologres architecture, migration from self‑built clusters, real‑time product features, decision‑making workflows, and future roadmap, highlighting performance and cost benefits.

FlinkLogisticsStreaming
0 likes · 16 min read
Real‑Time Data Platform Architecture and Cloud‑Native Flink Migration at Manbang
dbaplus Community
dbaplus Community
Mar 7, 2023 · Operations

How We Rescued a ClickHouse Logging Cluster After Zookeeper‑Induced Read‑Only Failure

A production logging system became unavailable due to Kafka backlog alerts, prompting an investigation that uncovered read‑only ClickHouse tables caused by mismatched Zookeeper metadata after a TTL policy change, leading to a step‑by‑step recovery involving Zookeeper restarts, metadata fixes, and table reconstruction.

Cluster RecoveryFlinkKafka
0 likes · 9 min read
How We Rescued a ClickHouse Logging Cluster After Zookeeper‑Induced Read‑Only Failure
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 3, 2023 · Big Data

How Alibaba Cloud EMR Evolved from Open‑Source Compatibility to Enterprise‑Grade Performance

This article outlines Alibaba Cloud EMR's three‑stage evolution—compatibility, contribution, and beyond open source—detailing its early Hadoop adoption, Flink and Spark innovations, cloud‑native optimizations, and enterprise‑grade features such as Remote Shuffle Service, performance benchmarks, and integrated diagnostics.

Alibaba CloudBig DataCloud Native
0 likes · 13 min read
How Alibaba Cloud EMR Evolved from Open‑Source Compatibility to Enterprise‑Grade Performance
DataFunTalk
DataFunTalk
Mar 1, 2023 · Databases

Evolution and Optimization of Tencent Music Content Library Data Platform: From Architecture 1.0 to 4.0

This article details the evolution of Tencent Music's content library data platform from version 1.0 to 4.0, describing business requirements, architectural redesigns—including migration from ClickHouse to Apache Doris, introduction of a semantic layer, and extensive write, query, and cost optimizations—while sharing practical lessons and future directions.

Apache DorisBig DataFlink
0 likes · 21 min read
Evolution and Optimization of Tencent Music Content Library Data Platform: From Architecture 1.0 to 4.0
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 1, 2023 · Big Data

How We Built a Scalable Real‑Time Data Architecture for a Complex Supply Chain

This article describes the challenges of a highly complex supply‑chain system, the evolution from early MySQL‑based reporting to a modern real‑time data platform using Flink, Kafka, ClickHouse, Hologres and other cloud services, and the tools and lessons learned to achieve low‑latency, high‑throughput analytics.

FlinkKafkaStreaming
0 likes · 11 min read
How We Built a Scalable Real‑Time Data Architecture for a Complex Supply Chain
DataFunSummit
DataFunSummit
Feb 28, 2023 · Big Data

Iceberg Technology Overview and Its Application at Xiaomi: Practices, Stream‑Batch Integration, and Future Plans

This article introduces the Iceberg table format, explains its core architecture and advantages such as transactionality, implicit partitioning and row‑level updates, details Xiaomi's practical deployments—including CDC pipelines, partition strategies, compaction services, and stream‑batch integration—and outlines future development directions.

Data LakeFlinkIceberg
0 likes · 20 min read
Iceberg Technology Overview and Its Application at Xiaomi: Practices, Stream‑Batch Integration, and Future Plans
macrozheng
macrozheng
Feb 28, 2023 · Big Data

How Tencent Music Scaled Its Content Data Platform with Apache Doris: From ClickHouse to 4.0 Architecture

This article details the evolution of Tencent Music's content data platform from version 1.0 to 4.0, describing the migration from ClickHouse to Apache Doris, the introduction of a semantic layer, optimization of data ingestion, query performance, and cost reduction strategies that dramatically improved data timeliness, operational efficiency, and storage costs.

Apache DorisBig DataData Architecture
0 likes · 23 min read
How Tencent Music Scaled Its Content Data Platform with Apache Doris: From ClickHouse to 4.0 Architecture
DeWu Technology
DeWu Technology
Feb 24, 2023 · Big Data

Real-Time Data Architecture Evolution for a Complex Supply Chain

The article traces Dewu’s supply‑chain data platform from slow MySQL reporting through early CDC‑based wide tables to a Flink‑Kafka‑ClickHouse 1.0 design, then to a more scalable Flink‑Kafka‑Hologres 2.0 architecture that solves upsert and compute‑storage separation, while detailing key operational tricks, code‑generation tools, and future plans for lake‑house integration.

Big DataFlinkHologres
0 likes · 10 min read
Real-Time Data Architecture Evolution for a Complex Supply Chain
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 24, 2023 · Big Data

Common Flink Task Submission Issues and Solutions on YARN

This article compiles frequent Flink job submission problems on YARN—including WordCount jar errors, HBase dependency conflicts, MySQL timeout, checkpoint restoration failures, parallelism limits, and unexpected container termination—provides root‑cause analysis and step‑by‑step remediation instructions.

Big DataCheckpointFlink
0 likes · 21 min read
Common Flink Task Submission Issues and Solutions on YARN
dbaplus Community
dbaplus Community
Feb 15, 2023 · Big Data

How Bilibili Scaled User Behavior Analytics with ClickHouse, Flink, and Iceberg

This article details Bilibili's 北极星 user behavior analysis platform, tracing its evolution from early Spark‑Jar models to Flink‑ClickHouse pipelines and Iceberg‑based full aggregation, and explains the technical solutions for event, retention, funnel, path analysis, data ingestion, cluster rebalancing, and performance optimizations that enable massive real‑time analytics on billions of daily events.

FlinkIcebergReal-time Processing
0 likes · 32 min read
How Bilibili Scaled User Behavior Analytics with ClickHouse, Flink, and Iceberg
ITPUB
ITPUB
Feb 7, 2023 · Big Data

How Kuaigou Built a Scalable Real‑Time Data Warehouse with Spark, Flink, and Cloud

Facing massive, multi‑source traffic and the need for instant analytics, Kuaigou’s real‑time data warehouse evolved from Spark on‑premise to a cloud‑native stack using Alibaba Blink, Flink, and layered OLAP models, streamlining development, cutting costs, and enabling diverse real‑time applications.

FlinkOLAPSpark
0 likes · 11 min read
How Kuaigou Built a Scalable Real‑Time Data Warehouse with Spark, Flink, and Cloud
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 6, 2023 · Big Data

Real-Time Data Warehouse Solutions with Hudi: Scenarios, Challenges, and Optimizations

This article presents an in‑depth overview of real‑time data‑warehouse scenarios, discusses challenges such as timeliness, update efficiency, and resource consumption, and details practical solutions using Apache Hudi, Flink, Presto, and related optimizations for ingestion, indexing, compaction, and query performance.

Big DataData LakeFlink
0 likes · 17 min read
Real-Time Data Warehouse Solutions with Hudi: Scenarios, Challenges, and Optimizations
Bilibili Tech
Bilibili Tech
Jan 31, 2023 · Big Data

Design and Optimization of Real-Time Data Quality Control (DQC) Platform on Bilibili's Big Data System

Bilibili redesigned its real-time data-quality control platform by replacing per-rule Flink jobs with a unified, dynamically-configured architecture that classifies Kafka topics, aggregates via InfluxDB full-table and continuous queries, mitigates data inflation, adds a high-performance proxy, and implements robust monitoring and recovery to ensure scalable, reliable data quality for its big-data services.

Big DataDQCFlink
0 likes · 22 min read
Design and Optimization of Real-Time Data Quality Control (DQC) Platform on Bilibili's Big Data System
ITPUB
ITPUB
Jan 26, 2023 · Big Data

How NetEase’s Arctic Unifies Streaming and Batch with Iceberg for Real‑Time Lakehouse

This article explains the challenges of a Lambda‑architecture data pipeline, introduces NetEase’s Arctic lakehouse built on Apache Iceberg, details its table‑store design, optimization cycles, consistency mechanisms, real‑time features, practical use cases, and future roadmap, highlighting its advantages over similar solutions.

ArcticData IntegrationFlink
0 likes · 14 min read
How NetEase’s Arctic Unifies Streaming and Batch with Iceberg for Real‑Time Lakehouse
ITPUB
ITPUB
Jan 22, 2023 · Big Data

How Flink Table Store Powers Real‑Time Financial Data Warehousing

This article details a banking‑focused real‑time data‑warehouse solution that leverages Flink Table Store to handle both incremental fact‑table updates and full‑table dimension calculations, compares three traditional approaches, and explains data ingestion, query modes, export options, and future streaming‑warehouse directions.

BankingELTFlink
0 likes · 20 min read
How Flink Table Store Powers Real‑Time Financial Data Warehousing
Ctrip Technology
Ctrip Technology
Jan 12, 2023 · Big Data

Real-Time Data Warehouse Architecture and Practice at Ctrip Hotel

The article explains why enterprises need real-time data warehouses, compares Lambda and Kappa architectures, describes Ctrip Hotel's Lambda‑plus‑OLAP variant built with Flink and StarRocks, and details practical solutions for ordering, wide‑table generation, and data validation that enable billion‑row, low‑latency analytics.

CtripFlinkLambda architecture
0 likes · 10 min read
Real-Time Data Warehouse Architecture and Practice at Ctrip Hotel
DataFunSummit
DataFunSummit
Jan 10, 2023 · Big Data

Exploring Iceberg in Huawei Terminal Cloud: Architecture, Features, and Future Plans

This article presents a comprehensive overview of Iceberg's adoption in Huawei Terminal Cloud, covering its architectural overview, key features such as Git‑style data management, real‑time processing, acceleration layers, and future development directions, along with a Q&A session addressing performance and implementation details.

Big DataData LakeFlink
0 likes · 15 min read
Exploring Iceberg in Huawei Terminal Cloud: Architecture, Features, and Future Plans
Bilibili Tech
Bilibili Tech
Jan 10, 2023 · Big Data

Technical Evolution of Bilibili's PolarStar User Behavior Analysis Platform

Bilibili’s PolarStar platform evolved from Spark‑based batch jobs to a Flink‑driven real‑time pipeline and finally to a unified Iceberg‑on‑ClickHouse model, cutting query latency to seconds, saving thousands of CPU cores and hundreds of gigabytes of Redis memory while enabling complex, near‑real‑time user‑behavior analyses and scalable data‑import, rebalancing, and compression optimizations.

FlinkIcebergclickhouse
0 likes · 30 min read
Technical Evolution of Bilibili's PolarStar User Behavior Analysis Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 10, 2023 · Big Data

How Alibaba’s Dolphin Engine Uses Flink + Hologres for Real‑Time Big Data

The Dolphin engine, built by Alibaba’s Data Engine team, combines Flink and Hologres to deliver ultra‑large‑scale OLAP, streaming, batch, and AI capabilities for real‑time advertising analytics, offering smart materialization, intelligent indexing, and vector recall while supporting millions of advertisers and petabyte‑level data.

Big DataFlinkHologres
0 likes · 13 min read
How Alibaba’s Dolphin Engine Uses Flink + Hologres for Real‑Time Big Data
DataFunTalk
DataFunTalk
Jan 6, 2023 · Big Data

ZhongAn's Hundred‑Billion‑Scale Data Integration Service: Architecture, Business Support, and Evolution

This article presents the architecture and practical experience of ZhongAn's hundred‑billion‑scale data integration service, covering common integration technologies, business support scenarios for offline and real‑time data, technical challenges, evolution from single‑machine to service‑oriented designs, and future directions using Flink and DataX.

Data IntegrationData PlatformDataX
0 likes · 31 min read
ZhongAn's Hundred‑Billion‑Scale Data Integration Service: Architecture, Business Support, and Evolution
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 3, 2023 · Big Data

Migrating Hive SQL Jobs to Flink Using the SQL Gateway

This article explains how to use Apache Flink 1.16's SQL Gateway to migrate Hive SQL tasks to Flink, covering the underlying Hive‑on‑Flink architecture, dialect compatibility, streaming and batch demos, configuration details, and practical tips for developers and platform engineers.

Batch ProcessingBig DataFlink
0 likes · 19 min read
Migrating Hive SQL Jobs to Flink Using the SQL Gateway
DataFunTalk
DataFunTalk
Jan 1, 2023 · Big Data

Zhihu's Real-Time Computing Platform: From Skytree 1.0 to Mipha 2.0

Zhihu’s real‑time computing platform, initially built as Skytree 1.0 on Kubernetes and later re‑engineered as Mipha 2.0 with Flink SQL, unified metadata management, dynamic jar loading, UDF support, Protobuf format, CDC integration, and extensive operational optimizations, now processes petabyte‑scale data with high reliability.

FlinkKubernetesReal‑Time Computing
0 likes · 21 min read
Zhihu's Real-Time Computing Platform: From Skytree 1.0 to Mipha 2.0
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 30, 2022 · Big Data

How Manbang Built a Cloud‑Native Real‑Time Data Platform with Flink & Hologres

Manbang's logistics platform leverages a cloud‑native architecture built on Alibaba Cloud Flink and Hologres to deliver minute‑level real‑time data, feature computation, and decision‑making that dramatically improves SLA, reduces operational costs, and powers intelligent driver‑cargo matching across the ecosystem.

FlinkHologresLogistics
0 likes · 16 min read
How Manbang Built a Cloud‑Native Real‑Time Data Platform with Flink & Hologres
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 28, 2022 · Big Data

Flink 1.16 Highlights: Adaptive Batch Scheduling, Speculative Execution, Hybrid Shuffle, Dynamic Partition Pruning, Hive SQL Migration, Checkpoint Enhancements, CDC Integration, and Table Store

Flink 1.16 introduces adaptive batch scheduling, speculative execution, hybrid shuffle, dynamic partition pruning, improved Hive SQL compatibility, advanced checkpoint mechanisms including changelog backend, and integrates CDC with Kafka and Table Store, offering faster, more stable, and easier-to-use stream‑batch processing capabilities.

Big DataCDCCheckpoint
0 likes · 8 min read
Flink 1.16 Highlights: Adaptive Batch Scheduling, Speculative Execution, Hybrid Shuffle, Dynamic Partition Pruning, Hive SQL Migration, Checkpoint Enhancements, CDC Integration, and Table Store
DataFunTalk
DataFunTalk
Dec 27, 2022 · Big Data

Multi‑Stream Join and Concurrency Control in Apache Hudi: Design, Implementation, and Usage

This article presents a comprehensive solution for multi‑stream joins in Apache Hudi, detailing the challenges of dimension and multi‑stream joins, the novel storage‑layer join approach, timeline‑based concurrency control, marker mechanisms, early conflict detection, payload customization, and practical usage with Flink and Spark, along with performance benefits and future directions.

Apache HudiData LakeFlink
0 likes · 31 min read
Multi‑Stream Join and Concurrency Control in Apache Hudi: Design, Implementation, and Usage
Tencent Advertising Technology
Tencent Advertising Technology
Dec 27, 2022 · Big Data

Design and Optimization of Tencent Advertising Log Data Lake Using Iceberg, Spark, and Flink

The article details how Tencent Advertising re‑architected its massive log pipeline by consolidating heterogeneous real‑time and offline logs into an Iceberg‑based data lake, introducing multi‑level partitioning, Spark and Flink ingestion, and numerous performance and cost optimizations for scalable big‑data analytics.

Big DataData LakeFlink
0 likes · 20 min read
Design and Optimization of Tencent Advertising Log Data Lake Using Iceberg, Spark, and Flink
Data Thinking Notes
Data Thinking Notes
Dec 23, 2022 · Big Data

How Real-Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices

This article explains why real‑time data warehouses are becoming essential, outlines their goals, compares them with traditional offline warehouses, and presents detailed design patterns, naming conventions, and case studies from Didi, Kuaishou, Tencent, Youzan and other enterprises, highlighting challenges and solutions for streaming, storage, and query layers.

Big Data ArchitectureData LakeETL
0 likes · 49 min read
How Real-Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices
DataFunTalk
DataFunTalk
Dec 23, 2022 · Big Data

Building a Lakehouse on Alibaba Cloud AnalyticDB (ADB) with Apache Hudi: Architecture, Challenges, and Practices

This article presents a comprehensive technical overview of Alibaba Cloud AnalyticDB's Lakehouse edition, detailing its unified architecture, key advantages, the challenges of ingesting billions of records with Apache Hudi, and the engineering solutions—including Flink integration, hotspot mitigation, memory optimization, OSS throttling handling, concurrent write support, lifecycle management, and TableService—that enable a cost‑effective, high‑performance lake‑to‑warehouse platform.

Apache HudiFlinkLakehouse
0 likes · 19 min read
Building a Lakehouse on Alibaba Cloud AnalyticDB (ADB) with Apache Hudi: Architecture, Challenges, and Practices
ITPUB
ITPUB
Dec 21, 2022 · Big Data

How Bilibili Optimized Flink Runtime for Massive Real‑Time Jobs

This article details Bilibili's extensive enhancements to the Flink runtime—including checkpoint recoverability, max‑parallelism calculations, State Processor API extensions, Full and Regional Checkpoints, hybrid HA, task‑level recovery, load‑balanced partitioners, and large‑scale cluster maintenance—to improve reliability and performance of its billion‑scale streaming workloads.

Big DataCheckpointFlink
0 likes · 33 min read
How Bilibili Optimized Flink Runtime for Massive Real‑Time Jobs
DataFunTalk
DataFunTalk
Dec 20, 2022 · Big Data

ByteDance's Practices for Tracking Data Governance and Pipeline Management

This article explains ByteDance's end‑to‑end tracking data lifecycle management, including pre‑report validation, the rationale for using BMQ over Kafka, quality governance examples, and how Flink‑based pipelines ensure data accuracy through SLA monitoring and checkpoint strategies.

Data GovernanceData TrackingFlink
0 likes · 5 min read
ByteDance's Practices for Tracking Data Governance and Pipeline Management
ITPUB
ITPUB
Dec 18, 2022 · Big Data

How to Build a Real‑Time Data Warehouse with EasyData: A Step‑by‑Step Guide

Learn how to design and implement a real‑time data warehouse for an app’s AB‑test monitoring using EasyData, covering data flow layers, CDC task creation, stream table registration, Flink SQL processing, and BI reporting, with detailed steps, code snippets, and practical tips.

CDCEasyDataFlink
0 likes · 13 min read
How to Build a Real‑Time Data Warehouse with EasyData: A Step‑by‑Step Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 15, 2022 · Big Data

Migrating Hive SQL to Flink SQL: Motivation, Challenges, Practice, Demo, and Future Plans

This technical article presents a comprehensive overview of migrating Hive SQL to Flink SQL, covering the motivations behind the migration, key challenges such as compatibility, stability and performance, practical implementation steps, a detailed demo, future development directions, and a Q&A session addressing common concerns.

Batch ProcessingBig DataData Lake
0 likes · 13 min read
Migrating Hive SQL to Flink SQL: Motivation, Challenges, Practice, Demo, and Future Plans
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 9, 2022 · Operations

How Alibaba’s Flink Cluster Inspector Eliminates Hotspot Machines in Real‑Time Streaming

This article details Alibaba Cloud's Flink Cluster Inspector, explaining the business challenges of hotspot machines, the analysis of resource over‑use, and the four‑stage solution—pre‑profiling, in‑process self‑healing, post‑recovery, and observability—that reduces latency, cuts costs, and improves operational efficiency.

ClusterFlinkHotSpot
0 likes · 19 min read
How Alibaba’s Flink Cluster Inspector Eliminates Hotspot Machines in Real‑Time Streaming
DataFunTalk
DataFunTalk
Dec 8, 2022 · Big Data

Arctic: NetEase’s Real-Time Lakehouse System Built on Apache Iceberg

This article introduces NetEase’s Arctic, a real‑time lakehouse system built on Apache Iceberg that unifies streaming and batch processing, explains the challenges of Lambda architecture, details Arctic’s features such as change/base stores, hidden queue, transaction handling, and shares internal practice cases and future roadmap.

Apache IcebergArcticData Lake
0 likes · 12 min read
Arctic: NetEase’s Real-Time Lakehouse System Built on Apache Iceberg
DataFunSummit
DataFunSummit
Dec 2, 2022 · Big Data

BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities

BitSail, ByteDance’s open‑source data integration engine, unifies batch, streaming, and incremental data synchronization across heterogeneous sources, detailing its evolution from early Flink‑based prototypes to a mature, plugin‑driven architecture with multi‑engine support, low‑cost co‑development, and robust CDC lakehouse capabilities.

Big DataCDCFlink
0 likes · 19 min read
BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities
Tencent Cloud Developer
Tencent Cloud Developer
Dec 2, 2022 · Big Data

Design and Implementation of a Hundred‑Billion‑Scale Real‑Time Monitoring System

The paper presents the design and deployment of a hundred‑billion‑scale real‑time monitoring platform that meets stringent data‑collection, analysis, storage, alerting and visualization requirements, compares Oceanus + Elastic Stack against a Zabbix‑Prometheus‑Grafana stack, selects the former, and details performance‑and cost‑optimizations that enable massive, low‑latency monitoring while maintaining high availability.

ElasticsearchFlinkOceanus
0 likes · 20 min read
Design and Implementation of a Hundred‑Billion‑Scale Real‑Time Monitoring System
Bilibili Tech
Bilibili Tech
Nov 29, 2022 · Big Data

How Bilibili Supercharged Flink: Checkpoint, HA, and Runtime Optimizations

This article details Bilibili's extensive enhancements to Flink's runtime—including checkpoint recoverability, operator ID stability, state processor extensions, hybrid high‑availability, regional checkpointing, and load‑based channel selection—to improve scalability, reliability, and operational efficiency of large‑scale streaming jobs.

Big DataCheckpointFlink
0 likes · 32 min read
How Bilibili Supercharged Flink: Checkpoint, HA, and Runtime Optimizations
DaTaobao Tech
DaTaobao Tech
Nov 23, 2022 · Big Data

Real-time Log Aggregation and Monitoring with Blink (Flink) on Mobile Endpoints

The article explains how Blink, Alibaba’s optimized Flink variant, uses dynamic tables and streaming‑SQL to ingest mobile telemetry via source tables, compute per‑minute metrics such as API success rates with tumbling windows, and write results to Alibaba Cloud Log Service, enabling real‑time dashboards and extensible use cases like fraud detection.

FlinkReal-time Streamingblink
0 likes · 10 min read
Real-time Log Aggregation and Monitoring with Blink (Flink) on Mobile Endpoints
21CTO
21CTO
Nov 20, 2022 · Big Data

How Meituan’s Logan Real‑Time Log System Boosts Debugging Across Mobile, Web, and IoT

This article details the design, architecture, and implementation of Meituan's Logan real‑time logging platform, covering its workflow, multi‑terminal collection SDK, ingestion, Flink‑based processing, consumption layers, stability measures, and future roadmap, illustrating how it improves fault diagnosis and system reliability.

ElasticsearchFlinkKafka
0 likes · 18 min read
How Meituan’s Logan Real‑Time Log System Boosts Debugging Across Mobile, Web, and IoT
ITPUB
ITPUB
Nov 18, 2022 · Big Data

How Xiaomi Uses Iceberg for Real‑Time Streaming and Batch Data Lakes

This article introduces Iceberg’s table‑format fundamentals, details Xiaomi’s large‑scale deployment of Iceberg for CDC and log ingestion, explores their streaming‑batch integration experiments, outlines future roadmap items, and provides a comprehensive Q&A covering practical challenges and solutions.

Batch ProcessingBig DataData Lake
0 likes · 23 min read
How Xiaomi Uses Iceberg for Real‑Time Streaming and Batch Data Lakes
Liulishuo Tech Team
Liulishuo Tech Team
Nov 17, 2022 · Big Data

Real‑time Data Warehouse Architecture and Technical Solution at Liulishuo

This article describes Liulishuo's migration to a Flink‑based real‑time data warehouse, covering background, benefits, technology selection (storage, Flink platform, dimension table connectors), overall architecture, concrete Hudi and Elasticsearch ingestion examples, processing SQL, and future outlook for unified batch‑streaming storage.

ElasticsearchFlinkHudi
0 likes · 15 min read
Real‑time Data Warehouse Architecture and Technical Solution at Liulishuo
DataFunTalk
DataFunTalk
Nov 15, 2022 · Artificial Intelligence

Flink ML: Iterative Execution Engine, Design, API, and Efficient Algorithm Library

This article introduces Flink ML, a DataStream‑based iterative engine and machine‑learning algorithm library, covering its overview, iterative execution engine design and API, performance comparisons with Spark ML, online logistic regression and K‑Means demos, and future development roadmap.

FlinkIterative EngineKMeans
0 likes · 22 min read
Flink ML: Iterative Execution Engine, Design, API, and Efficient Algorithm Library
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Nov 11, 2022 · Industry Insights

How Real-Time Data Middle Platforms are Transforming the Telecom Industry

This article analyzes why telecom operators need a real‑time data middle platform, outlines its layered architecture and model design, examines the shift from Lambda to Kappa and lakehouse approaches, and highlights how these innovations enable faster, scenario‑driven insights and competitive advantage.

Big Data ArchitectureData Middle PlatformFlink
0 likes · 15 min read
How Real-Time Data Middle Platforms are Transforming the Telecom Industry
DataFunTalk
DataFunTalk
Nov 9, 2022 · Artificial Intelligence

Design and Usage of Flink ML Java and Python APIs, Ecosystem Construction, and Future Directions

This article introduces the Flink Machine Learning Library, detailing the design and usage of its Java and Python APIs, core interfaces such as WithParams, Stage, Estimator, and AlgoOperator, workflow for training and inference, pipeline/graph construction, ecosystem initiatives, and upcoming development plans.

FlinkJava APIPython API
0 likes · 12 min read
Design and Usage of Flink ML Java and Python APIs, Ecosystem Construction, and Future Directions
High Availability Architecture
High Availability Architecture
Nov 7, 2022 · Backend Development

Design and Implementation of Meituan's Logan Real-Time Log System

This article describes how Meituan built Logan, a high‑performance, end‑to‑end real‑time logging platform for mobile, web, mini‑programs and IoT, covering its background, architecture, data collection, processing, consumption, monitoring, deployment strategies, achieved results and future roadmap.

Backend ArchitectureElasticsearchFlink
0 likes · 15 min read
Design and Implementation of Meituan's Logan Real-Time Log System
DataFunTalk
DataFunTalk
Nov 6, 2022 · Big Data

BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities

BitSail, an open‑source data integration engine from ByteDance, provides a unified solution for batch, streaming, full‑load, and incremental data synchronization across heterogeneous sources, detailing its background, technical evolution, architecture, low‑cost co‑building features, compatibility strategies, and future roadmap.

CDCData IntegrationFlink
0 likes · 18 min read
BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities
Meituan Technology Team
Meituan Technology Team
Nov 3, 2022 · Backend Development

Design and Implementation of Logan Real-Time Log System at Meituan

The article details Meituan’s end‑to‑end design and implementation of Logan, a high‑performance real‑time logging service for mobile apps, web, mini‑programs and IoT, covering background challenges, architecture layers, technology choices such as Flink and Elasticsearch, stability measures, deployment practices, achieved results and future plans.

Blue‑Green deploymentElasticsearchFlink
0 likes · 21 min read
Design and Implementation of Logan Real-Time Log System at Meituan
DataFunSummit
DataFunSummit
Oct 21, 2022 · Big Data

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

This article details Xiaohongshu's data platform architecture and three real‑time lake initiatives—log ingestion, CDC ingestion, and lake analysis—showcasing how Apache Iceberg, Flink, and custom shuffling algorithms solve small‑file and cross‑cloud challenges while enabling schema evolution and future multi‑cloud optimizations.

Apache IcebergBig DataCDC
0 likes · 16 min read
Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg
DataFunTalk
DataFunTalk
Oct 19, 2022 · Big Data

Understanding Flink Table Store: Design, Usage, and Roadmap

Flink Table Store, an Apache Flink subproject, provides a unified stream‑batch storage layer with SQL‑based table APIs, addressing real‑time and offline data needs, detailing its design goals, usage patterns, architectural layers, implementation choices, and upcoming roadmap.

FlinkLSM‑TreeStreaming
0 likes · 14 min read
Understanding Flink Table Store: Design, Usage, and Roadmap
DataFunSummit
DataFunSummit
Oct 18, 2022 · Big Data

Feature Overview of Apache Kyuubi (Incubating) v1.5.0

The article presents a detailed technical walkthrough of Apache Kyuubi 1.5.0, covering its service‑oriented architecture, high‑availability design, multi‑engine extensions for Spark, Flink, Trino and Hive, enhanced engine‑sharing policies, POOL mode configuration, and the project’s future roadmap.

Apache KyuubiBig DataEngine Architecture
0 likes · 13 min read
Feature Overview of Apache Kyuubi (Incubating) v1.5.0
ITPUB
ITPUB
Oct 15, 2022 · Big Data

Flink & Apache Hudi: Design, Practices, and Roadmap for Streaming Data Lakes

This talk introduces the evolution of data lakes, outlines Apache Hudi’s core features, details the Flink‑Hudi integration architecture—including write pipelines, small‑file handling, and read strategies—covers real‑world use cases such as near‑real‑time DB ingestion, OLAP, and ETL, and previews upcoming Hudi roadmap items.

Apache HudiBig DataData Lake
0 likes · 21 min read
Flink & Apache Hudi: Design, Practices, and Roadmap for Streaming Data Lakes
Shopee Tech Team
Shopee Tech Team
Oct 13, 2022 · Big Data

Improving Flink Unaligned Checkpoint: Problems, Principles, Optimizations, and Production Practices at Shopee

Shopee tackled frequent Flink checkpoint failures caused by back‑pressure by adopting and extending the community’s Unaligned Checkpoint mechanism—adding overdraft buffers, improving legacy sources, introducing an aligned‑checkpoint timeout, enabling output‑buffer switching, merging small HDFS files, and fixing network‑buffer deadlocks—now running hundreds of jobs with stable UC deployment and plans to enable it universally.

Big DataCheckpoint OptimizationFlink
0 likes · 18 min read
Improving Flink Unaligned Checkpoint: Problems, Principles, Optimizations, and Production Practices at Shopee
DataFunSummit
DataFunSummit
Oct 10, 2022 · Big Data

Stability Optimization Practices for Flink Jobs at Tencent

This article presents Tencent's practical experience in improving Flink job stability, covering the Oceanus platform, stability challenges, and concrete optimization techniques such as reducing failures, minimizing impact, accelerating recovery, and proactive issue detection, followed by a summary and future outlook.

Big DataFlinkReal‑Time Computing
0 likes · 12 min read
Stability Optimization Practices for Flink Jobs at Tencent
DeWu Technology
DeWu Technology
Oct 10, 2022 · Big Data

Offline and Real-Time User Profile Fusion Architecture

The architecture combines a nightly batch job that generates offline user profiles stored in HBase with a Flink‑based stream layer that lazily loads those profiles on app start and creates real‑time updates, then fuses both streams into a unified, timestamp‑ordered profile in Redis, forming a Lambda‑style pipeline.

Batch ProcessingFlinkHBase
0 likes · 10 min read
Offline and Real-Time User Profile Fusion Architecture
MaGe Linux Operations
MaGe Linux Operations
Oct 9, 2022 · Big Data

Master Flink on Kubernetes: Step‑by‑Step Deployment Guide

This guide walks you through deploying Apache Flink on Kubernetes, covering runtime modes, building Docker images, creating ConfigMaps and Services, launching session and application clusters, submitting jobs, monitoring the Web UI, and cleaning up resources, all with practical code snippets and commands.

Big DataDockerFlink
0 likes · 26 min read
Master Flink on Kubernetes: Step‑by‑Step Deployment Guide
vivo Internet Technology
vivo Internet Technology
Oct 9, 2022 · Big Data

Design and Implementation of a Real-Time Marketing Automation Engine at vivo

This fifth installment explains vivo’s real‑time marketing automation engine, detailing its business need, layered architecture (access, processing, output, management, warehouse), scalable event‑queue design, dynamic configuration, unified dispatch, Flink‑based metric enrichment, and rule‑engine integration to achieve low‑latency, high‑throughput personalized targeting.

Event-Driven ArchitectureFlinkMessage Queue
0 likes · 13 min read
Design and Implementation of a Real-Time Marketing Automation Engine at vivo
ITPUB
ITPUB
Sep 24, 2022 · Big Data

How ByteDance Scales Real‑Time Data Warehouses with Hudi and Flink

This article details ByteDance's practical experience building real‑time data warehouses on a data lake using Hudi, Flink, and related optimizations, covering scenario analysis, architecture, performance challenges, and future roadmap for scalable, low‑latency analytics.

FlinkHudi
0 likes · 19 min read
How ByteDance Scales Real‑Time Data Warehouses with Hudi and Flink
ITPUB
ITPUB
Sep 22, 2022 · Big Data

What Is a Real‑Time Data Warehouse? Product, Solution, and Use Cases Explained

The article explains the concept of real‑time data warehouses, traces their evolution from early relational databases to modern streaming‑batch engines, discusses whether they are products or solutions, outlines typical application scenarios, selection criteria, and future trends in the big‑data ecosystem.

FlinkSparkcloud
0 likes · 10 min read
What Is a Real‑Time Data Warehouse? Product, Solution, and Use Cases Explained
DataFunTalk
DataFunTalk
Sep 11, 2022 · Big Data

Flink Table Store v0.2: Application Scenarios, Core Features, and Future Roadmap

This article introduces Flink Table Store v0.2, explains its four primary application scenarios—offline warehouse acceleration, partial update, pre‑aggregation rollup, and real‑time warehouse enhancement—details the core lake‑storage architecture, bucket management, append‑only mode, and outlines the project’s future roadmap and trade‑off considerations.

BatchFlinkLake Storage
0 likes · 16 min read
Flink Table Store v0.2: Application Scenarios, Core Features, and Future Roadmap
Bilibili Tech
Bilibili Tech
Sep 6, 2022 · Big Data

Lancer: Evolution of Bilibili's Real-Time Streaming Architecture

Lancer, Bilibili’s real‑time streaming backbone, has evolved from a monolithic Flume pipeline to a log‑id‑isolated, Kubernetes‑native architecture where Go edge agents feed synchronous Kafka‑proxied gateways into per‑logid topics processed by dedicated Flink‑SQL jobs, delivering exactly‑once, back‑pressured, highly scalable data ingestion for billions of daily requests.

Big DataFlinkKafka
0 likes · 29 min read
Lancer: Evolution of Bilibili's Real-Time Streaming Architecture
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 5, 2022 · Big Data

Scaling Alibaba TCC to Millions of RPS with a High‑Availability Real‑Time Data Warehouse

This article details how Alibaba's TCC platform evolved its architecture over multiple phases—from a legacy database to a high‑availability real‑time data warehouse built on Flink and Hologres—highlighting the challenges, solutions, and cost‑saving measures that enabled millions of RPS, terabytes of storage, and sub‑second query latency.

FlinkHologresReal-Time
0 likes · 21 min read
Scaling Alibaba TCC to Millions of RPS with a High‑Availability Real‑Time Data Warehouse
Meituan Technology Team
Meituan Technology Team
Sep 1, 2022 · Databases

AI-Powered Database Anomaly Detection Service: Feature Analysis, Algorithm Selection, and Real-Time Monitoring

The article details Meituan's database platform team's end‑to‑end design of an AI‑driven anomaly detection service, covering feature analysis of time‑series patterns, algorithm selection (MAD, boxplot, EVT), model training, real‑time detection with Flink, operational metrics, and future enhancements.

AI AlgorithmsBoxplotDatabase Anomaly Detection
0 likes · 19 min read
AI-Powered Database Anomaly Detection Service: Feature Analysis, Algorithm Selection, and Real-Time Monitoring
Huolala Tech
Huolala Tech
Sep 1, 2022 · Big Data

How HuoLala Built a Real‑Time Metrics Monitoring Platform for Flink

This article explains how HuoLala’s real‑time R&D platform redesigns Flink metric collection, routing, and alerting using a custom Kafka‑based pipeline, flexible dashboards, and multi‑level metric governance to improve observability, reduce latency, and ensure data quality.

FlinkKafkaReal-Time
0 likes · 22 min read
How HuoLala Built a Real‑Time Metrics Monitoring Platform for Flink