Tagged articles
252 articles
Page 1 of 3
ByteDance Data Platform
ByteDance Data Platform
Feb 2, 2026 · Big Data

How StreamShield Powers Production‑Grade Resilience for Apache Flink at Massive Scale

ByteDance’s StreamShield delivers a three‑layer resiliency framework—engine self‑healing, hybrid replication at the cluster level, and chaos‑tested releases—that enables over 70,000 concurrent Flink jobs on 11 million CPU cores to meet strict SLAs with second‑level startup and robust fault tolerance.

Apache FlinkByteDanceReal‑Time Computing
0 likes · 6 min read
How StreamShield Powers Production‑Grade Resilience for Apache Flink at Massive Scale
iQIYI Technical Product Team
iQIYI Technical Product Team
Jan 8, 2026 · Big Data

How iQIYI Cut Stream Data Costs by 70%: From Private‑Cloud Kafka to AutoMQ

This article details iQIYI's evolution from a tightly coupled private‑cloud Kafka setup to a cloud‑native AutoMQ architecture, describing the challenges of scaling, the development of the Stream platform and Stream‑SDK, the migration to hybrid and public‑cloud Kafka, and the resulting cost and elasticity improvements.

AutoMQData ArchitectureKafka
0 likes · 12 min read
How iQIYI Cut Stream Data Costs by 70%: From Private‑Cloud Kafka to AutoMQ
Baidu Geek Talk
Baidu Geek Talk
Sep 24, 2025 · Big Data

How Feed Real‑Time Data Warehouse Was Re‑Engineered for Speed and Cost Savings

This article explains how Baidu’s Feed real‑time data warehouse was rebuilt using a pure streaming architecture, detailing the limitations of the previous stream‑batch design, the technical solutions—including core/non‑core data separation, metric calculation in streaming, and Parquet storage with Apache Arrow—and the resulting cost reductions, latency improvements, and future roadmap.

Apache ArrowBatch ProcessingParquet
0 likes · 17 min read
How Feed Real‑Time Data Warehouse Was Re‑Engineered for Speed and Cost Savings
Baidu Geek Talk
Baidu Geek Talk
Sep 1, 2025 · Big Data

How Baidu Netdisk Built a High‑Performance Real‑Time Engine with Flink

This article explains how Baidu Netdisk transitioned from Spark Streaming to a Flink‑based Tiangong real‑time computing engine, detailing the evolution, reasons for choosing Flink, architecture, configuration examples, business use cases, technical challenges, and future platform plans.

Baidu NetdiskBig DataFlink
0 likes · 16 min read
How Baidu Netdisk Built a High‑Performance Real‑Time Engine with Flink
DataFunSummit
DataFunSummit
Jun 18, 2025 · Big Data

How Real‑Time Lakehouse and Apache Paimon Transform Modern Data Architecture

This article explains the concept of a real‑time lakehouse, compares it with traditional batch warehouses, introduces Apache Paimon and its innovations such as native upserts, LSM storage, tags and branches, and showcases multiple enterprise use cases that demonstrate its low‑cost, low‑latency stream‑batch integration.

Apache PaimonData Lakereal-time lakehouse
0 likes · 17 min read
How Real‑Time Lakehouse and Apache Paimon Transform Modern Data Architecture
DataFunSummit
DataFunSummit
Jun 3, 2025 · Big Data

BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing

BiFang is a lake‑stream integrated storage engine that merges Apache Pulsar message‑queue capabilities with Iceberg data‑lake features, providing a single unified data store with full‑incremental queries, sub‑second visibility, exactly‑once semantics, and seamless integration with Flink, Spark, and StarRocks for both real‑time analytics and batch processing.

Apache IcebergApache PulsarLakehouse
0 likes · 13 min read
BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing
Big Data Tech Team
Big Data Tech Team
Jun 2, 2025 · Big Data

Master Apache Flink: A Complete Learning Roadmap from Basics to Advanced Projects

This guide outlines a comprehensive Apache Flink learning path, covering prerequisite knowledge, core concepts, APIs, state management, performance tuning, hands‑on projects, advanced topics like SQL optimization and Kubernetes deployment, plus curated resources and community tips to help beginners and intermediate users become proficient.

Apache FlinkFlink Tutoriallearning roadmap
0 likes · 8 min read
Master Apache Flink: A Complete Learning Roadmap from Basics to Advanced Projects
Full-Stack Internet Architecture
Full-Stack Internet Architecture
May 27, 2025 · Big Data

Understanding Event Streaming in Kafka: Core Concepts, Architecture, and Use Cases

This article explains Kafka's event streaming concept, detailing events and streams, core components such as producers, topics, partitions, consumers, persistence, and typical real‑time data pipeline, event‑driven architecture, stream processing, and log aggregation use cases, highlighting its role as a foundational big‑data infrastructure.

Event StreamingKafkaReal-time Processing
0 likes · 7 min read
Understanding Event Streaming in Kafka: Core Concepts, Architecture, and Use Cases
Tencent Cloud Developer
Tencent Cloud Developer
May 8, 2025 · Big Data

How Setats Unifies Stream, Batch, and Incremental Processing for Real‑Time Data Lakes

At the 2025 DA Data+AI Conference in Shanghai, Tencent Cloud unveiled Setats—a unified stream‑batch‑incremental engine that cuts system costs, delivers second‑level data visibility and real‑time changelog generation, and demonstrates measurable performance gains in automotive IoT analytics while integrating tightly with the WeData platform.

Batch ProcessingBig Data ArchitectureData Lake
0 likes · 5 min read
How Setats Unifies Stream, Batch, and Incremental Processing for Real‑Time Data Lakes
ByteDance Data Platform
ByteDance Data Platform
Apr 25, 2025 · Databases

How ByteDance’s AQETuner Cuts Query Latency by 23% and Boosts Reliability

ByteDance Data Platform’s recent breakthroughs in database research—spanning query‑level Bayesian tuning, adaptive stream‑processing parallelism, and learned cardinality estimation—were highlighted by two papers accepted at VLDB 2025 and ICDE 2025, showcasing significant performance gains and real‑world deployments.

AIParameter Tuningcardinality estimation
0 likes · 5 min read
How ByteDance’s AQETuner Cuts Query Latency by 23% and Boosts Reliability
Big Data Technology Architecture
Big Data Technology Architecture
Mar 1, 2025 · Big Data

Core Principles and Practical Guide to Flink CDC

This article explains CDC fundamentals, details Flink CDC's architecture and advantages, provides setup steps, code examples for SQL and DataStream APIs, discusses performance tuning, consistency, common issues, and typical real‑time data integration scenarios.

CDCChange Data CaptureDebezium
0 likes · 7 min read
Core Principles and Practical Guide to Flink CDC
DaTaobao Tech
DaTaobao Tech
Dec 18, 2024 · Big Data

Incremental Computation in Big Data: Flink Materialized Table and Paimon

The article explains how Flink 1.20’s Materialized Table combined with Paimon’s changelog storage enables incremental computation that unifies batch and streaming workloads, delivering minute‑level latency at lower cost, illustrated by a materialized‑table example while noting current streaming‑only support and future batch extensions.

Big DataFlinkIncremental Computation
0 likes · 13 min read
Incremental Computation in Big Data: Flink Materialized Table and Paimon
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 18, 2024 · Big Data

Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse

The article reviews the major directions of Flink 2.0—including compute‑storage separation, a new Materialized Table for unified batch‑stream processing, and deeper integration with Paimon for streaming warehouses—while offering a cautious perspective on their practical impact and migration challenges.

Batch-Stream IntegrationBig DataCompute-Storage Separation
0 likes · 5 min read
Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 29, 2024 · Big Data

How Fluss Redefines Real‑Time Stream Storage for Flink

Fluss, an open‑source real‑time stream storage project from Alibaba, integrates columnar formats and low‑latency updates with Apache Flink to address the limitations of traditional Kafka‑Flink pipelines, offering high throughput, low cost, and seamless lakehouse support for modern data analytics.

Apache FlinkFlussreal-time storage
0 likes · 6 min read
How Fluss Redefines Real‑Time Stream Storage for Flink
DaTaobao Tech
DaTaobao Tech
Oct 25, 2024 · Big Data

Using Temporary Table JOIN in Flink SQL for Real-Time Stream Enrichment

The article explains how to use Flink SQL’s temporary table join to enrich a real‑time traffic‑log stream with versioned tag data, detailing the required DDL, the time‑versioned join syntax, and essential watermark and idle‑timeout settings that prevent stalls and boundary‑delay issues.

FlinkSQLTemporary Join
0 likes · 7 min read
Using Temporary Table JOIN in Flink SQL for Real-Time Stream Enrichment
JD Retail Technology
JD Retail Technology
Sep 25, 2024 · Big Data

From a Personal Journey to Data Platform Architecture: Insights on Big Data, Cloud Computing, and System Design

The article narrates the author’s 30‑year programming career and shares technical reflections on building business‑agnostic, configurable data platforms, covering batch, streaming, interactive computing, big‑data sharding, Spark, Flink, cloud migration, and the philosophy of software architecture.

Batch ProcessingSoftware ArchitectureSystem Design
0 likes · 23 min read
From a Personal Journey to Data Platform Architecture: Insights on Big Data, Cloud Computing, and System Design
ZhongAn Tech Team
ZhongAn Tech Team
Sep 3, 2024 · Big Data

Real-Time Log Clustering Architecture and Continuous Clustering Algorithm

This article presents a comprehensive overview of a log clustering system, detailing its background, architecture based on Filebeat, Kafka, Flink, Elasticsearch, and Grafana, and introduces a continuous clustering algorithm using SimHash and Hamming distance for real‑time log governance and anomaly detection.

FlinkLog ClusteringReal-time analytics
0 likes · 14 min read
Real-Time Log Clustering Architecture and Continuous Clustering Algorithm
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Aug 16, 2024 · Big Data

Understanding the Lambda Architecture for Big Data Processing

This article explains the Lambda architecture—a three‑layer model combining batch and real‑time processing for large‑scale data, outlines its components, advantages, disadvantages, common tools, and compares it with the Kappa alternative while providing practical insights for data engineers.

Batch ProcessingBig DataLambda architecture
0 likes · 5 min read
Understanding the Lambda Architecture for Big Data Processing
DataFunSummit
DataFunSummit
Aug 7, 2024 · Big Data

Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, detailing its architecture, data quality assurance, stream‑batch integration, and future data lake implementation, while highlighting the use of Flink, ODPS, and Paimon for scalable, low‑latency analytics.

Data QualityFlinkreal-time data
0 likes · 15 min read
Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook
JD Cloud Developers
JD Cloud Developers
Aug 6, 2024 · Big Data

Master Real-Time Stream Processing with Flink: Windows & Watermarks

This article provides a comprehensive overview of real-time stream processing, covering data streams, window types, event and processing time, Flink's operator model, watermark mechanisms, and strategies for handling out-of-order and late data to ensure accurate, timely analytics.

FlinkReal-time analyticsWatermarks
0 likes · 15 min read
Master Real-Time Stream Processing with Flink: Windows & Watermarks
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 5, 2024 · Big Data

Key Features of Apache Flink 1.20: Materialized Tables, DISTRIBUTED BY, and State/Checkpoint Optimizations

The article reviews Apache Flink 1.20, highlighting the new Materialized Table concept, the DISTRIBUTED BY support for load‑balanced storage and join performance, and state/checkpoint file merging improvements, while providing code examples and practical insights for users.

Apache FlinkBig DataCheckpoint Optimization
0 likes · 7 min read
Key Features of Apache Flink 1.20: Materialized Tables, DISTRIBUTED BY, and State/Checkpoint Optimizations
DataFunTalk
DataFunTalk
Jul 18, 2024 · Big Data

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent exploration of real-time data warehouse architecture, covering its six-module design, data quality assurance mechanisms, stream‑batch unified processing with Flink and ODPS, and a forward‑looking data lake solution built on Paimon, offering practical insights for large‑scale streaming analytics.

Flinkstream processing
0 likes · 15 min read
Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook
Baidu Tech Salon
Baidu Tech Salon
Jun 18, 2024 · Big Data

Scalable, High‑Accuracy Event Logging Monitoring for Baidu's Log Platform

Baidu’s log platform processes billions of daily page‑view events and, to monitor them accurately with minute‑level latency, implements a downstream streaming‑task architecture that maps limited custom dimensions, uses watermarks for completeness, trims raw data, aggregates into 5‑minute windows, and outputs concise metrics to Elasticsearch, achieving high accuracy, configurability, and low cost.

Log MonitoringReal-time analyticsUBC
0 likes · 11 min read
Scalable, High‑Accuracy Event Logging Monitoring for Baidu's Log Platform
Alibaba Cloud Native
Alibaba Cloud Native
Mar 24, 2024 · Cloud Native

How RocketMQ 5.0 Enables Lightweight Cloud‑Native Stream Processing with RStreams and RSQLDB

This article explains the evolution of message middleware, introduces core concepts of stream processing, and details RocketMQ 5.0's native lightweight stream engine RStreams and its stream database RSQLDB, showing how they simplify real‑time data integration, computation, and scaling in cloud‑native environments.

RSQLDBRStreamsReal-time analytics
0 likes · 14 min read
How RocketMQ 5.0 Enables Lightweight Cloud‑Native Stream Processing with RStreams and RSQLDB
DataFunTalk
DataFunTalk
Dec 27, 2023 · Big Data

Apache Flink 2023: Core Technical Achievements and Future Directions

The article reviews Apache Flink's rapid development over the past decade, highlighting its 2023 community growth, SIGMOD award, major releases, streaming SQL enhancements, incremental checkpointing, batch maturity, cloud‑native scaling, and integration with the emerging Lakehouse architecture.

Apache FlinkBig DataCheckpoint
0 likes · 11 min read
Apache Flink 2023: Core Technical Achievements and Future Directions
dbaplus Community
dbaplus Community
Dec 14, 2023 · Big Data

How Flink Powers Unified Stream‑Batch Processing at Scale: Production Lessons

This article explains why Flink was chosen as a unified stream‑batch engine, details the migration from Lambda architecture, outlines the Flink Batch production workflow, and shares key optimizations such as Hive dialect support, CTAS, adaptive scheduling, speculative execution, and future roadmap for large‑scale data processing.

Adaptive SchedulerBatch ProcessingBig Data
0 likes · 31 min read
How Flink Powers Unified Stream‑Batch Processing at Scale: Production Lessons
dbaplus Community
dbaplus Community
Dec 10, 2023 · Big Data

How Bilibili Built a Remote State Backend for Flink Using Taishan KV Store

This article explains Bilibili's design and implementation of a remote state backend for Flink, detailing the motivations, pain points of the existing RocksDBStateBackend, the architecture of TaishanStateBackend, and the performance optimizations applied to achieve storage‑compute separation and faster rescaling.

Big DataFlinkRemote Storage
0 likes · 21 min read
How Bilibili Built a Remote State Backend for Flink Using Taishan KV Store
Data Thinking Notes
Data Thinking Notes
Oct 11, 2023 · Big Data

How ByteDance Optimized Its E‑Commerce Data Lake to Cut Costs and Boost Real‑Time Accuracy

ByteDance revamped its traditional Lambda architecture for e‑commerce traffic data by introducing a new lake ingestion solution that reduces development and operational costs, ensures timely and stable data, and outlines future plans covering business background, ODS lake design, archiving tags, delayed data handling, and real‑time stability.

Big DataData LakeFlink
0 likes · 7 min read
How ByteDance Optimized Its E‑Commerce Data Lake to Cut Costs and Boost Real‑Time Accuracy
Efficient Ops
Efficient Ops
Sep 24, 2023 · Information Security

How China Postal Savings Bank Built an Enterprise‑Level AI‑Powered Anti‑Fraud Platform

The 2023 China International Service Trade Fair’s Digital Transformation Forum showcased the Postal Savings Bank’s enterprise‑grade intelligent anti‑fraud platform, detailing its stream‑batch integration, graph‑based AI models, and multi‑layer risk‑control architecture that safeguards millions of daily transactions across retail, agricultural, and credit services.

China Postal Savings Bankanti-fraudstream processing
0 likes · 8 min read
How China Postal Savings Bank Built an Enterprise‑Level AI‑Powered Anti‑Fraud Platform
WeiLi Technology Team
WeiLi Technology Team
Aug 2, 2023 · Big Data

How to Build a Real-Time Data Warehouse: Architectures, Challenges, and Industry Practices

This article examines the growing demand for real‑time data warehouses, compares mature streaming frameworks, evaluates Lambda, Kappa and hybrid architectures, reviews industry implementations from Didi and OPPO, and proposes a standard‑layer + stream + data‑lake solution with Apache Paimon, Hudi, and Iceberg.

Apache FlinkKappa architectureLambda architecture
0 likes · 27 min read
How to Build a Real-Time Data Warehouse: Architectures, Challenges, and Industry Practices
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 31, 2023 · Big Data

From BI to Kappa: How Data Architecture Evolved in the Big Data Era

This article traces the evolution of data architecture from early BI systems through traditional big‑data stacks, streaming, Lambda and Kappa designs, and explains how a unified stream‑batch model simplifies development while keeping logic consistent across data‑analysis and pipeline applications.

BI systemsBig DataData Architecture
0 likes · 16 min read
From BI to Kappa: How Data Architecture Evolved in the Big Data Era
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 24, 2023 · Big Data

Real-time Data Warehouse Governance: Optimization Practices and Technical Enhancements

This article presents a comprehensive overview of the current challenges, platform architecture, governance planning, and technical optimizations—including Flink SQL, Kafka batch processing, and partitioned stream tables—used to improve resource efficiency, stability, and scalability of a large‑scale real‑time data warehouse.

Flink optimizationKafka batchResource Efficiency
0 likes · 25 min read
Real-time Data Warehouse Governance: Optimization Practices and Technical Enhancements
Didi Tech
Didi Tech
Jun 14, 2023 · Big Data

Real-Time Data Development Practices and Component Selection at Didi

Didi’s unified real‑time data stack outlines best‑practice component choices for four key scenarios—metric monitoring, BI analysis, online services, and feature/tag systems—detailing pipelines from source to sink, resource‑usage guidelines, and a one‑stop development platform to build stable, high‑performance streaming solutions.

ClickHouseDruidFlink
0 likes · 17 min read
Real-Time Data Development Practices and Component Selection at Didi
Programmer DD
Programmer DD
Jun 7, 2023 · Cloud Native

Why Apache Pulsar Is the Next‑Gen Cloud‑Native Streaming Platform

This article explains how Apache Pulsar combines messaging, storage, and lightweight function computing into a cloud‑native streaming platform, detailing its architecture, storage‑compute separation, tiered storage, pluggable protocols, reliability guarantees, and rich ecosystem compared with traditional queues and Kafka.

Apache PulsarCloud NativeData Reliability
0 likes · 10 min read
Why Apache Pulsar Is the Next‑Gen Cloud‑Native Streaming Platform
WeChat Backend Team
WeChat Backend Team
Jun 1, 2023 · Big Data

How WeChat Boosted Flink Stability with TaskManager Recovery and Load Balancing

This article details WeChat’s Gemini‑2.0 real‑time streaming platform built on Flink, explaining two key stability enhancements: a TaskManager‑level partial failure recovery that avoids data loss during node crashes, and a load‑balancing scheduler that evenly distributes tasks across TaskManagers to improve resource utilization and reduce latency.

Big DataFlinkKubernetes
0 likes · 16 min read
How WeChat Boosted Flink Stability with TaskManager Recovery and Load Balancing
Architects Research Society
Architects Research Society
Apr 18, 2023 · Backend Development

Event Sourcing, CQRS, and Stream Processing with Apache Kafka

Event sourcing models state changes as immutable logs, and when combined with CQRS and Kafka Streams, it enables scalable, fault‑tolerant architectures where write and read paths are decoupled, supporting local or external state stores, interactive queries, and zero‑downtime upgrades.

Backend ArchitectureCQRSkafka streams
0 likes · 21 min read
Event Sourcing, CQRS, and Stream Processing with Apache Kafka
Baidu Geek Talk
Baidu Geek Talk
Mar 27, 2023 · Big Data

Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse

The article details Baidu's precise watermark design for its unified streaming‑batch data warehouse, describing how a centralized watermark server and client ensure end‑to‑end data completeness, align real‑time and batch windows with 99.9‑99.99% precision, and support accurate anti‑fraud calculations within the broader big‑data ecosystem.

Apache FlinkBaiduBig Data
0 likes · 14 min read
Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse
ITPUB
ITPUB
Mar 24, 2023 · Big Data

What’s New in Apache Flink 1.17? Key Features, Performance Gains, and Streaming Warehouse Advances

Apache Flink 1.17 introduces a suite of batch and streaming enhancements—including a new Streaming Warehouse API, significant TPC‑DS performance boosts, adaptive batch scheduling, improved checkpointing, expanded SQL capabilities, Hive connector upgrades, and broader filesystem support—while also delivering upgrades to FRocksDB, Calcite, and the token framework to strengthen its position as a leading unified data‑processing engine.

Apache FlinkBatch ProcessingCheckpoint
0 likes · 23 min read
What’s New in Apache Flink 1.17? Key Features, Performance Gains, and Streaming Warehouse Advances
Architects Research Society
Architects Research Society
Mar 15, 2023 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Considerations

This article explains why exactly‑once semantics are needed for stream‑processing applications, describes Kafka's transactional model and semantics, details the Java transaction API and its usage, and discusses the internal components, performance trade‑offs, and practical guidelines for building reliable Kafka‑based pipelines.

Distributed SystemsExactly-OnceJava
0 likes · 17 min read
Understanding Transactions in Apache Kafka: Semantics, API, and Practical Considerations
DataFunSummit
DataFunSummit
Feb 16, 2023 · Big Data

JD Real-Time Data Product Practice: Overview, Low‑Code Platform, Stream‑Batch Integration, and Operations

This article summarizes JD's real‑time data product practice, covering product overview, low‑code real‑time platform construction, stream‑batch integrated architecture, and the three‑layer operational defense model, while highlighting challenges, evolution, user distribution, and future directions.

Big DataLow‑code platformreal-time data
0 likes · 13 min read
JD Real-Time Data Product Practice: Overview, Low‑Code Platform, Stream‑Batch Integration, and Operations
Ctrip Technology
Ctrip Technology
Jan 12, 2023 · Big Data

Real-Time Data Warehouse Architecture and Practice at Ctrip Hotel

The article explains why enterprises need real-time data warehouses, compares Lambda and Kappa architectures, describes Ctrip Hotel's Lambda‑plus‑OLAP variant built with Flink and StarRocks, and details practical solutions for ordering, wide‑table generation, and data validation that enable billion‑row, low‑latency analytics.

CtripFlinkLambda architecture
0 likes · 10 min read
Real-Time Data Warehouse Architecture and Practice at Ctrip Hotel
Bilibili Tech
Bilibili Tech
Jan 6, 2023 · Backend Development

Hotspot Detection and Local Cache Framework for High‑Traffic Applications

The presented hotspot detection and local‑cache framework leverages the HeavyKeeper streaming top‑k algorithm with decay‑based burst detection, integrates zero‑code SDK support and a whitelist‑enabled LRU cache, enabling a few megabytes of memory to achieve up to 85% hit rates and dramatically reduce Redis load in high‑traffic applications.

distributed cachingheavykeeperhotspot detection
0 likes · 21 min read
Hotspot Detection and Local Cache Framework for High‑Traffic Applications
vivo Internet Technology
vivo Internet Technology
Dec 28, 2022 · Big Data

Vivo Real-Time Computing Platform: Architecture, Practices, and Applications

The Vivo Real‑Time Computing Platform, built on Apache Flink, delivers a one‑stop data construction and governance solution that processes up to 5 PB daily, offering high‑availability submission and control services, robust stability, rich SQL usability, efficient Kubernetes deployment, strong security, and supports real‑time warehouses and short‑video recommendation, while targeting future elastic scaling and lake‑house unification.

Apache FlinkData PlatformReal‑Time Computing
0 likes · 18 min read
Vivo Real-Time Computing Platform: Architecture, Practices, and Applications
Data Thinking Notes
Data Thinking Notes
Dec 23, 2022 · Big Data

How Real-Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices

This article explains why real‑time data warehouses are becoming essential, outlines their goals, compares them with traditional offline warehouses, and presents detailed design patterns, naming conventions, and case studies from Didi, Kuaishou, Tencent, Youzan and other enterprises, highlighting challenges and solutions for streaming, storage, and query layers.

Big Data ArchitectureData LakeETL
0 likes · 49 min read
How Real-Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 29, 2022 · Big Data

How Flink’s Stream‑Batch Fusion Is Transforming Real‑Time Big Data

The article explores Apache Flink’s eight‑year journey to becoming a top‑level Apache project, Alibaba’s extensive contributions, the rise of stream‑batch unified computing, its impact on real‑time data integration, cloud‑native deployment, and the emerging Flink‑based data‑warehouse and serverless solutions.

Apache FlinkBig DataCloud Native
0 likes · 15 min read
How Flink’s Stream‑Batch Fusion Is Transforming Real‑Time Big Data
ByteDance Terminal Technology
ByteDance Terminal Technology
Nov 18, 2022 · Big Data

Practices and Techniques for Large‑Scale Distributed Trace Data Analysis at ByteDance

This article presents ByteDance’s experience building a massive trace‑data analysis platform, covering observability fundamentals, the evolution of its distributed tracing system, various aggregation computation models, technical architecture choices, and concrete use‑cases such as precise topology, traffic estimation, dependency analysis, performance anti‑patterns, bottleneck detection, and error propagation.

Big DataDistributed TracingMicroservices
0 likes · 21 min read
Practices and Techniques for Large‑Scale Distributed Trace Data Analysis at ByteDance
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Nov 11, 2022 · Industry Insights

How Real-Time Data Middle Platforms are Transforming the Telecom Industry

This article analyzes why telecom operators need a real‑time data middle platform, outlines its layered architecture and model design, examines the shift from Lambda to Kappa and lakehouse approaches, and highlights how these innovations enable faster, scenario‑driven insights and competitive advantage.

Big Data ArchitectureData Middle PlatformFlink
0 likes · 15 min read
How Real-Time Data Middle Platforms are Transforming the Telecom Industry
Shopee Tech Team
Shopee Tech Team
Oct 13, 2022 · Big Data

Improving Flink Unaligned Checkpoint: Problems, Principles, Optimizations, and Production Practices at Shopee

Shopee tackled frequent Flink checkpoint failures caused by back‑pressure by adopting and extending the community’s Unaligned Checkpoint mechanism—adding overdraft buffers, improving legacy sources, introducing an aligned‑checkpoint timeout, enabling output‑buffer switching, merging small HDFS files, and fixing network‑buffer deadlocks—now running hundreds of jobs with stable UC deployment and plans to enable it universally.

Big DataCheckpoint OptimizationFlink
0 likes · 18 min read
Improving Flink Unaligned Checkpoint: Problems, Principles, Optimizations, and Production Practices at Shopee
DeWu Technology
DeWu Technology
Oct 10, 2022 · Big Data

Offline and Real-Time User Profile Fusion Architecture

The architecture combines a nightly batch job that generates offline user profiles stored in HBase with a Flink‑based stream layer that lazily loads those profiles on app start and creates real‑time updates, then fuses both streams into a unified, timestamp‑ordered profile in Redis, forming a Lambda‑style pipeline.

Batch ProcessingFlinkHBase
0 likes · 10 min read
Offline and Real-Time User Profile Fusion Architecture
DataFunTalk
DataFunTalk
Oct 2, 2022 · Big Data

Real-time Data Warehouse Architecture and Hologres Technology Overview

This article explains the evolving requirements of real‑time data warehouses, analyzes Alibaba's Hologres technology principles, presents recommended architectures for various latency scenarios, and discusses practical case studies, performance, security, and cost‑optimization strategies for modern big‑data platforms.

Big DataHologrescloud computing
0 likes · 24 min read
Real-time Data Warehouse Architecture and Hologres Technology Overview
ITPUB
ITPUB
Sep 22, 2022 · Big Data

What Is a Real‑Time Data Warehouse? Product, Solution, and Use Cases Explained

The article explains the concept of real‑time data warehouses, traces their evolution from early relational databases to modern streaming‑batch engines, discusses whether they are products or solutions, outlines typical application scenarios, selection criteria, and future trends in the big‑data ecosystem.

FlinkSparkcloud
0 likes · 10 min read
What Is a Real‑Time Data Warehouse? Product, Solution, and Use Cases Explained
37 Interactive Technology Team
37 Interactive Technology Team
Aug 23, 2022 · Big Data

Optimizing Game Event Reporting with Stream Processing to Overcome ClickHouse Performance Bottlenecks

Faced with ClickHouse query times ballooning to over an hour for massive game‑event data, the team replaced the DB‑pull model with a stream‑processing pipeline that evaluates trigger rules in real time, cuts batch queries by 60 %, and brings reporting latency down to minutes.

ClickHouseGame AnalyticsPerformance Optimization
0 likes · 6 min read
Optimizing Game Event Reporting with Stream Processing to Overcome ClickHouse Performance Bottlenecks
DaTaobao Tech
DaTaobao Tech
Aug 11, 2022 · Big Data

Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing

The article describes how fragmented real‑time, batch, and online data‑warehouse pipelines suffer from low productivity and inconsistent data quality, and introduces a unified SQL engine built on Apache Calcite that parses, optimizes, and compiles a single SQL statement into executable plans for ODPS, Flink, or Java, leveraging Janino code generation, multi‑backend state storage, and snapshot‑join semantics to boost performance and simplify development.

Batch ProcessingCalciteCode Generation
0 likes · 16 min read
Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing
Baidu Geek Talk
Baidu Geek Talk
Aug 9, 2022 · Big Data

How to Build a Real-Time Data Warehouse with Unified Stream‑Batch Architecture

This article examines the evolution of big‑data architectures, identifies the latency and maintenance issues of classic Lambda designs, and presents a hybrid Lambda‑Kappa solution that unifies streaming and batch processing to achieve minute‑level data freshness and second‑level query latency while reducing development cost.

Big DataKappa architectureLambda architecture
0 likes · 13 min read
How to Build a Real-Time Data Warehouse with Unified Stream‑Batch Architecture
ITPUB
ITPUB
Aug 8, 2022 · Big Data

Why Real‑Time Data Warehouses Are the New Competitive Edge for Enterprises

As markets become increasingly dynamic, companies must build real‑time infrastructure to gain timely insights, and this article explains the three real‑time analytics scenarios, the limitations of traditional stream engines, and how Skylab’s integrated cloud‑native platform and Omega architecture address those challenges.

cloud-nativereal-time datastream processing
0 likes · 9 min read
Why Real‑Time Data Warehouses Are the New Competitive Edge for Enterprises
DataFunTalk
DataFunTalk
Jul 26, 2022 · Big Data

Feature Platform Architecture and Stream‑Batch Integrated Solutions

This talk presents Shuhe Technology’s feature platform, detailing its four‑layer architecture, feature storage services, stream‑batch integrated processing, event‑center design, consistency models, and four model‑strategy invocation schemes, illustrating data flows from MySQL through Sqoop, Kafka, Flink, HBase and ClickHouse.

Big DataClickHouseFlink
0 likes · 17 min read
Feature Platform Architecture and Stream‑Batch Integrated Solutions
DataFunSummit
DataFunSummit
Jul 9, 2022 · Big Data

Alibaba's One‑Stop Real‑Time Data Warehouse: Hologres Architecture and CCO Implementation Experience

The article reviews the shift of big‑data computing from batch to real‑time, outlines the evolution of one‑stop real‑time data warehouses, introduces Alibaba's Hologres solution and its technical advantages, and shares the CCO department’s three‑generation architecture upgrades and practical use cases.

AlibabaHologresdata engineering
0 likes · 16 min read
Alibaba's One‑Stop Real‑Time Data Warehouse: Hologres Architecture and CCO Implementation Experience
Shopee Tech Team
Shopee Tech Team
Apr 28, 2022 · Big Data

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Shopee replaced its hourly Hive pipeline with a hybrid Flink‑Hudi real‑time data warehouse that groups Kafka topics, applies lightweight stream ETL, uses partial‑update MOR tables for multi‑stream joins and COW tables for versioned batches, cutting latency from about 90 minutes to 2–30 minutes and halving resource usage.

Apache FlinkApache HudiBatch Processing
0 likes · 20 min read
Building Real-Time Data Warehouse with Flink + Hudi at Shopee
Top Architect
Top Architect
Apr 23, 2022 · Big Data

Ensuring No Duplicate and No Loss in Baidu Log Middle Platform: Architecture, Challenges, and Solutions

This article explains the design, implementation, and future plans of Baidu's log middle platform, detailing its lifecycle management, service architecture, data reliability goals of eliminating duplication and loss, and the technical measures taken across SDKs, servers, and streaming pipelines to achieve near‑100% data integrity.

Backend ArchitectureBig DataData Reliability
0 likes · 15 min read
Ensuring No Duplicate and No Loss in Baidu Log Middle Platform: Architecture, Challenges, and Solutions
DataFunSummit
DataFunSummit
Apr 22, 2022 · Big Data

Huya Real-Time Computing SLA Practice: Platform Evolution, Core SLA Definition, Capability Building, and Future Outlook

The talk details Huya’s real‑time computing platform evolution from chaotic early stages to a unified, containerized system, defines core SLA metrics focused on latency compliance, describes capability enhancements such as demand monitoring, task analysis, dynamic scaling, and outlines future goals for usability, stability, openness, and unified stream‑batch processing.

FlinkReal‑Time ComputingSLA
0 likes · 12 min read
Huya Real-Time Computing SLA Practice: Platform Evolution, Core SLA Definition, Capability Building, and Future Outlook
Architect
Architect
Apr 18, 2022 · Big Data

Ensuring Data Accuracy and Reliability in Baidu's Log Middle Platform

This article describes Baidu's log middle platform architecture, its data lifecycle management, integration status, terminology, service overview, core challenges of ensuring data accuracy, and the implemented optimizations for persistent storage, service decomposition, and SDK reporting to achieve near‑100% no‑repeat no‑loss reliability.

Backend ArchitectureBig DataData Reliability
0 likes · 15 min read
Ensuring Data Accuracy and Reliability in Baidu's Log Middle Platform
dbaplus Community
dbaplus Community
Apr 13, 2022 · Big Data

How Meituan Built a Scalable Real‑Time Data Warehouse with Flink

This article explains Meituan's real‑time data warehouse architecture, covering typical business scenarios, the evolution of its streaming platform, key design challenges, solutions such as unified data models, SQL‑based development, UDF hosting, operator optimizations, and future plans for incremental processing and unified batch‑stream semantics.

FlinkMeituanreal-time data
0 likes · 18 min read
How Meituan Built a Scalable Real‑Time Data Warehouse with Flink
High Availability Architecture
High Availability Architecture
Apr 11, 2022 · Big Data

Ensuring Data Accuracy and Reliability in Baidu Log Platform: Architecture, Challenges, and Solutions

This article introduces the current state of Baidu's log platform, explains its lifecycle from data collection to downstream applications, analyzes the challenges of achieving near‑zero duplication and loss, and presents architectural optimizations and best‑practice recommendations to improve data stability and accuracy across the system.

Big DataData ReliabilitySystem Architecture
0 likes · 19 min read
Ensuring Data Accuracy and Reliability in Baidu Log Platform: Architecture, Challenges, and Solutions
Alibaba Cloud Native
Alibaba Cloud Native
Feb 22, 2022 · Big Data

Why RocketMQ-Streams Delivers High‑Performance, Low‑Resource Stream Computing

RocketMQ-Streams targets massive data, high‑filtering, lightweight windowed computations with a lightweight, high‑performance design that runs on as little as 1 CPU core and 1 GB RAM, offering 2‑5× speed gains over traditional big‑data engines and supporting Flink‑compatible SQL, UDFs, and cloud‑native deployment.

Exactly-OnceFlink SQLRocketMQ-Streams
0 likes · 10 min read
Why RocketMQ-Streams Delivers High‑Performance, Low‑Resource Stream Computing
DataFunSummit
DataFunSummit
Jan 30, 2022 · Big Data

Real‑time Data Warehouse at Meituan: Architecture, Challenges, and Solutions

This article presents Meituan's real‑time data warehouse platform, describing typical streaming use cases, the evolution of its architecture from Storm and Spark Streaming to Flink, the challenges of development, operations and data quality, and the engineering solutions—including unified SQL, web IDE, UDF hosting, pipeline testing, and operator performance optimizations—implemented to support large‑scale, low‑latency analytics.

Flinkplatform architecturereal-time data
0 likes · 17 min read
Real‑time Data Warehouse at Meituan: Architecture, Challenges, and Solutions
Architecture Digest
Architecture Digest
Jan 21, 2022 · Big Data

Building a Real-Time Data Warehouse with Flink: Architecture, Core Concepts, and Practical Implementation

This article explains how to build a unified stream‑batch real‑time data warehouse using FlinkSQL, covering prerequisite knowledge, five core concepts, two implementation approaches, a comparison of traditional versus real‑time architectures, and a comprehensive hands‑on example, illustrated with diagrams.

Batch ProcessingData ArchitectureFlink
0 likes · 6 min read
Building a Real-Time Data Warehouse with Flink: Architecture, Core Concepts, and Practical Implementation
StarRocks
StarRocks
Jan 12, 2022 · Big Data

How Flink + StarRocks Deliver Lightning‑Fast Real‑Time Data Warehousing

This article explains the evolution, challenges, and technical solutions for building an end‑to‑end real‑time data warehouse by combining Apache Flink's stream processing with StarRocks' ultra‑fast OLAP engine, covering architecture, data models, integration methods, best‑practice cases, and future roadmap.

Big DataFlinkOLAP
0 likes · 21 min read
How Flink + StarRocks Deliver Lightning‑Fast Real‑Time Data Warehousing
DataFunTalk
DataFunTalk
Jan 11, 2022 · Big Data

Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses

In an exclusive InfoQ interview, Apache Flink community leader Wang Feng (aka Mo Wen) outlines the evolution of Flink toward a Streaming Warehouse, detailing recent technical advances, use‑case scenarios, and the upcoming Dynamic Table storage that aim to unify stream and batch processing for real‑time data‑warehouse workloads.

Apache FlinkBig DataDynamic Table
0 likes · 16 min read
Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses
Tencent Cloud Developer
Tencent Cloud Developer
Jan 7, 2022 · Big Data

Design and Implementation of a Hundred‑Billion‑Scale Real‑Time Monitoring System

The paper details a hundred‑billion‑scale real‑time monitoring system, outlining a layered architecture from collection to alerting, comparing Oceanus + Elastic Stack and Zabbix + Prometheus + Grafana solutions, and showing how targeted optimizations in stream processing and Elasticsearch achieve scalability, low latency, and significant cost savings.

Elastic StackOceanusPerformance Optimization
0 likes · 19 min read
Design and Implementation of a Hundred‑Billion‑Scale Real‑Time Monitoring System
Youzan Coder
Youzan Coder
Dec 8, 2021 · Big Data

How to Build a Real‑Time Data Quality Monitoring System with Flink

This article outlines a comprehensive approach to monitoring and ensuring the accuracy and timeliness of real‑time data streams, detailing background challenges, solution design, implementation steps using Flink and automated testing, alert handling procedures, and future improvement plans.

AlertingData QualityFlink
0 likes · 10 min read
How to Build a Real‑Time Data Quality Monitoring System with Flink
HomeTech
HomeTech
Dec 7, 2021 · Big Data

Flink Task Auto-scaling Design and Implementation

This article presents the design and implementation of Flink task auto‑scaling, covering background, manual and automatic scaling mechanisms, architecture with RescaleCoordinator, persistence via Zookeeper and HDFS, scaling policies for parallelism, CPU and memory, and future plans for fine‑grained and time‑based resource adjustments.

Auto ScalingFlinkHDFS
0 likes · 4 min read
Flink Task Auto-scaling Design and Implementation
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 13, 2021 · Big Data

Why “Exactly‑Once” Doesn’t Guarantee Consistency in Stream Processing

This article examines the true meaning of consistency in stream computing, clarifies common misconceptions about exactly‑once semantics, formalizes consistency challenges, and reviews how major stream engines such as Google MillWheel, Apache Flink, Kafka Streams, and Spark Streaming implement end‑to‑end consistency.

Big DataExactly-Oncefault tolerance
0 likes · 29 min read
Why “Exactly‑Once” Doesn’t Guarantee Consistency in Stream Processing
NetEase Game Operations Platform
NetEase Game Operations Platform
Sep 18, 2021 · Big Data

StreamflySQL: NetEase Games’ Journey from Template JAR to SQL Gateway for Flink SQL Platformization

This article details NetEase Games’ evolution of its Flink SQL platform, from the early StreamflySQL v1 template‑JAR approach to the v2 SQL‑Gateway architecture, discussing design decisions, challenges such as metadata persistence, multi‑tenant security, horizontal scaling, and job state management.

FlinkReal-time analyticsSQL
0 likes · 17 min read
StreamflySQL: NetEase Games’ Journey from Template JAR to SQL Gateway for Flink SQL Platformization
Tencent Cloud Developer
Tencent Cloud Developer
Sep 3, 2021 · Big Data

Design and Implementation of a Real-Time Video Live Streaming Analytics System Using Tencent Cloud Big Data Services

The article details a cloud‑native architecture on Tencent Cloud that uses CKafka, Oceanus (Flink), MySQL, HBase and a BI service to ingest live‑streaming logs, aggregate gift‑reward metrics in real time, store results, and display them on a continuously refreshed dashboard.

Business IntelligenceCloud ServicesVideo Streaming
0 likes · 15 min read
Design and Implementation of a Real-Time Video Live Streaming Analytics System Using Tencent Cloud Big Data Services
ByteDance ADFE Team
ByteDance ADFE Team
Aug 31, 2021 · Big Data

Evolution of the Big Data Technology Stack Over the Past Five Years

This article reviews the evolution of big data technologies in the last five years, covering streaming and batch processing frameworks, column‑store NoSQL databases, programming language trends, the cloud‑native multi‑model database Lindorm, and practical Flink/Blink usage with code examples.

Big DataFlinkLindorm
0 likes · 24 min read
Evolution of the Big Data Technology Stack Over the Past Five Years
Meituan Technology Team
Meituan Technology Team
Aug 26, 2021 · Big Data

How Meituan Built a Scalable Real‑Time Data Warehouse: Architecture & Lessons

Meituan Waimai’s data intelligence team outlines a universal real‑time data‑warehouse methodology that combines a production platform with an interactive analytics engine, detailing scenarios, technology choices, architectural designs, platformization, SLA management, and a practical Lambda‑style case study.

FlinkKappa architectureLambda architecture
0 likes · 18 min read
How Meituan Built a Scalable Real‑Time Data Warehouse: Architecture & Lessons
Python Crawling & Data Mining
Python Crawling & Data Mining
Aug 21, 2021 · Big Data

Understanding Flink’s Architecture: From APIs to Cluster Deployment

This article explains Flink’s three‑layer architecture (APIs & Libraries, Core, Deploy), details its programming interfaces, runtime engine, deployment options, and core concepts such as stateful computation and time semantics, providing a comprehensive guide for building robust stream and batch applications.

FlinkStateful ComputingTime Semantics
0 likes · 13 min read
Understanding Flink’s Architecture: From APIs to Cluster Deployment
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Aug 12, 2021 · Backend Development

Master Kafka Streams in Spring Boot: Real‑Time Data Processing with Code Samples

This guide walks through setting up Kafka Streams with Spring Boot 2.3, covering environment configuration, core concepts, topology design, and multiple practical examples—including message sending, listening, transformations, aggregations, filtering, branching, and multi‑field grouping—complete with full code snippets and execution results.

JavaKafkaSpring Boot
0 likes · 13 min read
Master Kafka Streams in Spring Boot: Real‑Time Data Processing with Code Samples
Volcano Engine Developer Services
Volcano Engine Developer Services
Aug 11, 2021 · Big Data

How Volcengine Solves Big Data Quality Challenges with a Unified Stream‑Batch Platform

Volcengine’s Data Quality Platform bridges the gap between data validation and resource‑intensive computation in large‑scale environments, offering unified stream‑batch monitoring, data exploration, comparison, and alerting across Hive, ClickHouse, Kafka, and more, while addressing scalability, latency, and resource optimization challenges.

Big DataData Qualitymonitoring
0 likes · 19 min read
How Volcengine Solves Big Data Quality Challenges with a Unified Stream‑Batch Platform
ByteFE
ByteFE
Jul 29, 2021 · Frontend Development

Implementing a Large File Chunked Upload Library: A Full-Stack TypeScript Guide

This article provides a comprehensive guide to building a large file chunked upload library from scratch using TypeScript, detailing both server-side stream processing for memory efficiency and client-side MD5 calculation with retry mechanisms to ensure reliable and performant file transfers.

MD5 verificationNode.jsReact
0 likes · 22 min read
Implementing a Large File Chunked Upload Library: A Full-Stack TypeScript Guide
Xianyu Technology
Xianyu Technology
Jul 13, 2021 · Big Data

Design and Implementation of Xianyu Real-Time Data Warehouse

To meet Xianyu’s billion‑event‑per‑day real‑time analysis needs, the team built a petabyte‑scale warehouse using Hologres for storage and Alibaba‑enhanced Flink (Blink) for streaming, organized into ODS, DWD, DWS, ADS and DIM layers, enabling minute‑level aggregations, rapid anomaly detection, and instant product‑team insights.

Big DataHologresblink
0 likes · 12 min read
Design and Implementation of Xianyu Real-Time Data Warehouse