Tagged articles
946 articles
Page 2 of 10
AntData
AntData
Dec 11, 2024 · Big Data

Flex: A Stream‑Batch Integrated Vectorized Engine for Flink

This article introduces Flex, a Flink‑compatible stream‑batch vectorized engine built on Velox and Gluten, explains the SIMD‑based execution model, details native operator optimizations, fallback mechanisms, correctness and usability improvements, and presents performance results and future development plans.

FlinkSIMDVelox
0 likes · 17 min read
Flex: A Stream‑Batch Integrated Vectorized Engine for Flink
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 9, 2024 · Big Data

Why Kafka Falls Short for Real‑Time Analytics and How Fluss Changes the Game

Flink Forward Asia 2024 highlighted the limitations of Kafka for real‑time analytics—lack of updates, poor data exploration, costly back‑tracking, and high network overhead—while introducing Fluss, a columnar streaming storage that offers low‑latency reads, CDC, lake‑stream integration, and efficient Delta Join for scalable, fast analytics.

Big DataDelta JoinFlink
0 likes · 15 min read
Why Kafka Falls Short for Real‑Time Analytics and How Fluss Changes the Game
StarRocks
StarRocks
Dec 2, 2024 · Big Data

How Paimon Revamps Lakehouse Management and Supercharges Queries with StarRocks

This article details Tongcheng Travel's migration from Hive/Kudu/Hudi to Paimon for lakehouse integration, highlighting a 30% resource reduction, three‑fold write speed gains, significant query acceleration via StarRocks, the end‑to‑end architecture across ODS‑DWD‑DWS‑ADS layers, and future roadmap plans.

Big DataFlinkLakehouse
0 likes · 18 min read
How Paimon Revamps Lakehouse Management and Supercharges Queries with StarRocks
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 29, 2024 · Big Data

Introducing Fluss: The Next‑Gen Real‑Time Stream Storage for Flink

Alibaba unveiled the open‑source Fluss project, a next‑generation real‑time stream storage built for Apache Flink that tackles traditional Kafka‑Flink limitations with millisecond‑level reads, columnar pruning, CDC support, and seamless Lakehouse integration, aiming to boost low‑latency analytics at scale.

Big DataFlinkopen source
0 likes · 6 min read
Introducing Fluss: The Next‑Gen Real‑Time Stream Storage for Flink
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Nov 27, 2024 · Big Data

Highlights of Tongcheng Travel’s 8th Big Data Technology Salon

The 8th Tongcheng Travel Big Data Technology Salon in Suzhou featured four expert talks covering Tencent Cloud’s Meson Spark engine, near‑line computing for travel itineraries, a Flink‑based real‑time risk control system, and Apache Paimon’s latest lake‑warehouse innovations, followed by a data‑driven business perspective session.

Apache PaimonBig DataData Lake
0 likes · 7 min read
Highlights of Tongcheng Travel’s 8th Big Data Technology Salon
Bilibili Tech
Bilibili Tech
Nov 26, 2024 · Big Data

Bilibili’s Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practices

Bilibili migrated its massive user‑behavior, commercial AI training, and database synchronization pipelines from Hive and Kafka to an Iceberg‑based streaming‑batch architecture, using Flink and the Magnus optimizer to achieve minute‑level freshness, reduce CPU and memory usage by about 20‑22 %, save roughly 3.55 M CNY annually, and dramatically improve query latency and join performance.

BatchData IntegrationData Lake
0 likes · 20 min read
Bilibili’s Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practices
DataFunSummit
DataFunSummit
Nov 23, 2024 · Big Data

Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice

This article presents Bilibili's end‑to‑end exploration of a streaming‑batch unified data pipeline built on Apache Iceberg, detailing the original and iterated architectures for massive user behavior transmission, online AI training, DB synchronization, and dimension‑join, along with performance gains, cost savings, and future plans.

Batch ProcessingData LakeFlink
0 likes · 20 min read
Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 18, 2024 · Cloud Native

Developing a Custom Kubernetes Controller for Flink Task Scheduling

This article provides a step‑by‑step guide to building a custom Kubernetes controller in Go that uses Prometheus metrics to intelligently schedule Flink TaskManager Pods, covering the underlying scheduler concepts, code implementation, Docker image creation, RBAC setup, deployment, testing, and advanced considerations.

Cloud NativeCustom SchedulerFlink
0 likes · 38 min read
Developing a Custom Kubernetes Controller for Flink Task Scheduling
Efficient Ops
Efficient Ops
Nov 7, 2024 · Operations

Automating Flink Task Deployment with Tekton, GitLab, and Serverless K8s

This guide details how to automate the full lifecycle of Flink tasks—including environment setup, integration, building, deployment, and task control—using GitLab, Tekton CI/CD, serverless containers on Alibaba Cloud, and Kubernetes, all orchestrated via Feishu cards.

AutomationFlinkKubernetes
0 likes · 4 min read
Automating Flink Task Deployment with Tekton, GitLab, and Serverless K8s
JD Tech Talk
JD Tech Talk
Nov 5, 2024 · Big Data

Low-Code Generation of Flink StreamGraph, JobGraph, and ExecutionGraph

This article explains how to generate Flink's StreamGraph, JobGraph, and ExecutionGraph using a low‑code canvas approach, detailing the underlying concepts, the transformation pipeline from DataStream to DAG, and providing Java code examples for building and assembling operators via drag‑and‑drop.

Big DataExecutionGraphFlink
0 likes · 5 min read
Low-Code Generation of Flink StreamGraph, JobGraph, and ExecutionGraph
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 1, 2024 · Big Data

Real‑Time Lakehouse Architecture at Ximalaya Live: Leveraging Flink, Paimon, and StarRocks

This article details Ximalaya Live's transition from an offline‑centric data warehouse to a real‑time lakehouse using Flink, Paimon, and StarRocks, covering business background, architectural challenges, technology evaluation, implementation steps, encountered issues, performance gains, and future expansion plans.

FlinkLakehousePaimon
0 likes · 12 min read
Real‑Time Lakehouse Architecture at Ximalaya Live: Leveraging Flink, Paimon, and StarRocks
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 28, 2024 · Big Data

Key Considerations for Using Paimon Primary Key Tables

This article explains the characteristics of Paimon primary key tables, covering bucket selection, cross‑partition update issues, recommended record‑level expiration settings, and two approaches to handle file compaction, including configuration tweaks and dedicated compaction tasks.

Big DataBucketFlink
0 likes · 6 min read
Key Considerations for Using Paimon Primary Key Tables
DaTaobao Tech
DaTaobao Tech
Oct 25, 2024 · Big Data

Using Temporary Table JOIN in Flink SQL for Real-Time Stream Enrichment

The article explains how to use Flink SQL’s temporary table join to enrich a real‑time traffic‑log stream with versioned tag data, detailing the required DDL, the time‑versioned join syntax, and essential watermark and idle‑timeout settings that prevent stalls and boundary‑delay issues.

FlinkSQLTemporary Join
0 likes · 7 min read
Using Temporary Table JOIN in Flink SQL for Real-Time Stream Enrichment
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 25, 2024 · Big Data

How Real-Time Flink Powers Automotive Big Data: Architecture & Case Studies

This article, based on Alibaba Cloud expert Li Lubing’s presentation, examines the rapid growth of China’s new energy vehicle market, outlines typical automotive big‑data architectures, compares Lambda and real‑time lakehouse solutions built with Flink and Apache Paimon, and showcases real‑world customer deployments.

Big DataFlinkLakehouse
0 likes · 18 min read
How Real-Time Flink Powers Automotive Big Data: Architecture & Case Studies
DataFunSummit
DataFunSummit
Oct 24, 2024 · Big Data

Bilibili’s Large Language Model‑Based Intelligent Assistant for the Big Data Platform: Architecture, Principles, and Deployment

This article details Bilibili’s implementation of a large‑language‑model‑driven intelligent assistant for its massive big‑data platform, covering background, problem analysis, architectural design, knowledge‑base construction, precision and recall challenges, deployment across offline and real‑time Spark/Flink diagnostics, and future outlooks.

AgentBig DataFlink
0 likes · 23 min read
Bilibili’s Large Language Model‑Based Intelligent Assistant for the Big Data Platform: Architecture, Principles, and Deployment
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 22, 2024 · Big Data

Key Frameworks and Characteristics of Lakehouse Architecture: A Ground‑Level Perspective

This article reviews the emerging lakehouse architecture, outlines its core frameworks such as Hudi, Iceberg, Paimon, Flink, and Doris, discusses their storage‑compute separation, read‑write optimizations, and highlights how companies of different sizes adopt these technologies based on cost, efficiency, and specific business scenarios.

Data ArchitectureFlinkLakehouse
0 likes · 6 min read
Key Frameworks and Characteristics of Lakehouse Architecture: A Ground‑Level Perspective
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 27, 2024 · Big Data

How Alibaba Cloud’s New Vectorized Engines Are Revolutionizing Real‑Time Big Data Processing

At the 2024 Cloud Xi Conference, Alibaba Cloud unveiled a suite of vectorized big‑data solutions—including the Flash engine for Flink, EMR Serverless Spark with a 300% speed boost, upgraded lakehouse architecture, and real‑world case studies—showcasing massive performance gains, cost reductions, and broader serverless adoption.

Big DataData LakeFlink
0 likes · 8 min read
How Alibaba Cloud’s New Vectorized Engines Are Revolutionizing Real‑Time Big Data Processing
dbaplus Community
dbaplus Community
Sep 23, 2024 · Operations

How Bilibili Scaled Monitoring: From Prometheus to a 2.0 VM‑Flink Architecture

Bilibili rebuilt its monitoring platform to handle explosive metric growth by separating collection, storage, and compute, adopting VictoriaMetrics, zone‑based scheduling, and Flink‑driven pre‑aggregation, which together improved stability, query performance, cloud data quality, and overall observability.

FlinkObservabilityPrometheus
0 likes · 31 min read
How Bilibili Scaled Monitoring: From Prometheus to a 2.0 VM‑Flink Architecture
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 13, 2024 · Big Data

How Qimao Scales 20PB Data with StarRocks, Flink, and Real‑Time Analytics

Qimao, a Shanghai‑based cultural entertainment internet firm, details its 20 PB big‑data architecture built on StarRocks, Flink, Hive, and Redis, covering data ingestion, real‑time processing, audience selection, metric anomaly drill‑down, 730‑day aggregation, and future plans for metric acceleration and full‑link data governance.

Big DataData GovernanceData Warehouse
0 likes · 13 min read
How Qimao Scales 20PB Data with StarRocks, Flink, and Real‑Time Analytics
Architect
Architect
Sep 12, 2024 · Operations

How Bilibili Scaled Its Monitoring: From Prometheus OOMs to VictoriaMetrics & Flink Pre‑Aggregation

The article details Bilibili's evolution of its monitoring platform, describing the stability and performance challenges of a Prometheus‑Thanos stack, the redesign using VictoriaMetrics, collection‑storage separation, unit‑level disaster recovery, query‑tree auto‑replacement, Flink‑based pre‑aggregation, Grafana upgrades, and future roadmap for observability.

Cloud NativeFlinkObservability
0 likes · 30 min read
How Bilibili Scaled Its Monitoring: From Prometheus OOMs to VictoriaMetrics & Flink Pre‑Aggregation
DataFunSummit
DataFunSummit
Sep 9, 2024 · Big Data

Exploring Real-Time Lakehouse Architecture with Apache Paimon

This article presents Xiaomi's real-time lakehouse architecture, outlines its current challenges, introduces Apache Paimon and several use‑case scenarios—including stream join optimization, streaming upserts, and lookup joins—while discussing expected benefits and future directions for a more efficient, unified data platform.

Apache PaimonFlinkIceberg
0 likes · 12 min read
Exploring Real-Time Lakehouse Architecture with Apache Paimon
ZhongAn Tech Team
ZhongAn Tech Team
Sep 3, 2024 · Big Data

Real-Time Log Clustering Architecture and Continuous Clustering Algorithm

This article presents a comprehensive overview of a log clustering system, detailing its background, architecture based on Filebeat, Kafka, Flink, Elasticsearch, and Grafana, and introduces a continuous clustering algorithm using SimHash and Hamming distance for real‑time log governance and anomaly detection.

FlinkLog ClusteringReal-time analytics
0 likes · 14 min read
Real-Time Log Clustering Architecture and Continuous Clustering Algorithm
StarRocks
StarRocks
Aug 14, 2024 · Big Data

Mastering StarRocks & Apache Paimon: A Fast‑Track Lakehouse Guide

This guide provides a comprehensive overview of Apache Paimon’s architecture, key features, and advantages, explains how to integrate it with StarRocks for real‑time lakehouse analytics, and walks through a complete quick‑start setup including component installation, Flink and Kafka deployment, data ingestion, table creation, and query execution with time‑travel support.

Apache PaimonFlinkKafka
0 likes · 18 min read
Mastering StarRocks & Apache Paimon: A Fast‑Track Lakehouse Guide
DataFunSummit
DataFunSummit
Aug 11, 2024 · Big Data

Real‑time Business Data Anomaly Attribution with Tugraph‑Analytics at Huolala

This article describes how Huolala leveraged the open‑source high‑performance streaming graph engine Tugraph‑Analytics together with Flink to build a real‑time business data anomaly detection and attribution system, detailing the background, architectural evolution, technical choices, implementation details, benefits, and future plans.

FlinkTuGraph-Analyticsgraph database
0 likes · 12 min read
Real‑time Business Data Anomaly Attribution with Tugraph‑Analytics at Huolala
ITPUB
ITPUB
Aug 11, 2024 · Operations

Scaling Bilibili’s Metrics Platform with VictoriaMetrics and Flink Pre‑aggregation

This article details how Bilibili redesigned its monitoring system to overcome explosive metric growth by separating collection and storage, adopting VictoriaMetrics, implementing zone‑based scheduling, automating PromQL query replacement, and using Flink for efficient pre‑aggregation, resulting in dramatically lower latency and higher stability.

FlinkObservabilityPromQL
0 likes · 31 min read
Scaling Bilibili’s Metrics Platform with VictoriaMetrics and Flink Pre‑aggregation
Bilibili Tech
Bilibili Tech
Aug 9, 2024 · Operations

Design and Optimization of Monitoring 2.0 Architecture with VictoriaMetrics and Flink

The new Monitoring 2.0 architecture separates collection, compute and storage, adopts VictoriaMetrics for compact time‑series storage and a zone‑based scheduler, introduces push‑based ingestion, uses Flink for real‑time pre‑aggregation and automatic PromQL rewrite, delivering ten‑fold query speedups, sub‑300 ms p90 latency, and dramatically higher write and query throughput.

FlinkObservabilityPrometheus
0 likes · 29 min read
Design and Optimization of Monitoring 2.0 Architecture with VictoriaMetrics and Flink
DataFunSummit
DataFunSummit
Aug 7, 2024 · Big Data

Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, detailing its architecture, data quality assurance, stream‑batch integration, and future data lake implementation, while highlighting the use of Flink, ODPS, and Paimon for scalable, low‑latency analytics.

Data QualityFlinkreal-time data
0 likes · 15 min read
Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook
JD Cloud Developers
JD Cloud Developers
Aug 6, 2024 · Big Data

Master Real-Time Stream Processing with Flink: Windows & Watermarks

This article provides a comprehensive overview of real-time stream processing, covering data streams, window types, event and processing time, Flink's operator model, watermark mechanisms, and strategies for handling out-of-order and late data to ensure accurate, timely analytics.

FlinkReal-time analyticsWatermarks
0 likes · 15 min read
Master Real-Time Stream Processing with Flink: Windows & Watermarks
JavaEdge
JavaEdge
Aug 5, 2024 · Big Data

How to Handle Data Delay in Flink: Watermarks, Late Events, and Window Strategies

This article explains why out‑of‑order events cause delayed data in Flink, outlines their impact on computation accuracy and timeliness, identifies root causes such as network latency and watermark misconfiguration, and provides concrete watermark settings, allowed lateness, and step‑by‑step window‑triggering procedures with examples.

Data DelayFlinkWindow
0 likes · 8 min read
How to Handle Data Delay in Flink: Watermarks, Late Events, and Window Strategies
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 2, 2024 · Big Data

How Real-Time Computing Transforms Finance, Automotive, Logistics, and Retail

Businesses across finance, automotive, logistics, and retail are increasingly adopting real-time computing with Flink and Hologres to meet growing data volume and latency demands, enabling instant analytics, risk monitoring, dynamic recommendations, and efficient operations, while cloud architectures evolve to support massive, low‑latency data streams.

FlinkHologresReal‑Time Computing
0 likes · 19 min read
How Real-Time Computing Transforms Finance, Automotive, Logistics, and Retail
DataFunTalk
DataFunTalk
Jul 18, 2024 · Big Data

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent exploration of real-time data warehouse architecture, covering its six-module design, data quality assurance mechanisms, stream‑batch unified processing with Flink and ODPS, and a forward‑looking data lake solution built on Paimon, offering practical insights for large‑scale streaming analytics.

Flinkstream processing
0 likes · 15 min read
Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 12, 2024 · Big Data

How Flink + Hologres Power Real‑Time Streaming Warehouses

This article explains how combining Flink with Hologres creates a unified, real‑time streaming warehouse, detailing traditional layering approaches, the advantages of the Hologres‑based solution, core capabilities like Binlog and resource isolation, and a practical e‑commerce case study demonstrating performance gains.

Big DataFlinkHologres
0 likes · 21 min read
How Flink + Hologres Power Real‑Time Streaming Warehouses
DeWu Technology
DeWu Technology
Jul 5, 2024 · Databases

StarRocks 2.5.13 Cross-Cluster Upgrade and Data Migration Practices

The article outlines a cross‑cluster upgrade to StarRocks 2.5.13, evaluating resource and stability costs, and presents two migration schemes—using external tables and a Flink connector—along with planning, parallel execution, validation steps, and results showing successful migration of over 10 TB at 2 Gb/s across ten nodes, while noting future automation and CDC enhancements.

Cluster UpgradeData MigrationExternal Table
0 likes · 15 min read
StarRocks 2.5.13 Cross-Cluster Upgrade and Data Migration Practices
Volcano Engine Developer Services
Volcano Engine Developer Services
Jul 3, 2024 · Backend Development

How We Scaled a Billion‑Item Search Engine with Elasticsearch: From Zero to One

This article details the practical journey of building and scaling an Elasticsearch‑based search system that supports tens of millions to billions of items, covering architecture design, capacity planning, multi‑data‑center deployment, data synchronization via RocketMQ and Flink, and multi‑layer reconciliation to ensure consistency and high QPS.

ConsistencyElasticsearchFlink
0 likes · 15 min read
How We Scaled a Billion‑Item Search Engine with Elasticsearch: From Zero to One
JD Cloud Developers
JD Cloud Developers
Jul 3, 2024 · Big Data

How to Build a High‑Availability Real‑Time Logistics Dashboard with Flink and ClickHouse

This article details the design and implementation of a high‑availability, real‑time logistics supply‑chain dashboard, covering Flink‑based data pipelines, ClickHouse OLAP storage, metric consistency, stability measures, extensible configuration, and comprehensive monitoring to ensure accurate, scalable performance during major promotions.

Big DataClickHouseDashboard
0 likes · 9 min read
How to Build a High‑Availability Real‑Time Logistics Dashboard with Flink and ClickHouse
JD Tech Talk
JD Tech Talk
Jul 3, 2024 · Big Data

Real-time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Processing, and Stability Practices

This article describes the design and implementation of a high‑availability, real‑time logistics supply‑chain dashboard using Flink and ClickHouse, covering data processing pipelines, metric consistency, stability mechanisms, extensible configurations, and monitoring techniques to guide similar large‑screen projects.

ClickHouseFlinkReal-time Dashboard
0 likes · 9 min read
Real-time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Processing, and Stability Practices
JD Tech
JD Tech
Jul 2, 2024 · Big Data

Real‑Time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Modeling, and Stability Design

This article presents the design and implementation of a high‑availability, real‑time logistics supply‑chain monitoring dashboard, covering its data processing pipeline with Flink, storage choices between Elasticsearch and ClickHouse, multi‑layer architecture, metric consistency, stability mechanisms, extensibility configurations, and monitoring practices.

Big DataClickHouseDashboard
0 likes · 11 min read
Real‑Time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Modeling, and Stability Design
DataFunSummit
DataFunSummit
Jul 1, 2024 · Big Data

Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks

This article details JD Retail's transition from a complex Lambda architecture to a unified real‑time data pipeline using Flink, Hudi, and StarRocks, addressing data completeness versus latency, reducing maintenance costs, improving storage efficiency, and delivering faster, more consistent analytics for business users.

Data WarehouseFlinkHudi
0 likes · 13 min read
Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks
WeiLi Technology Team
WeiLi Technology Team
Jun 28, 2024 · Big Data

How to Build a Robust Big Data Monitoring and Alerting System

This article explains why high‑availability design and comprehensive monitoring are essential for modern big‑data platforms, outlines a layered architecture, and provides practical guidance on health checks, alerting, and data‑quality monitoring across storage, compute, scheduling, and service layers.

FlinkHDFSarchitecture
0 likes · 14 min read
How to Build a Robust Big Data Monitoring and Alerting System
DaTaobao Tech
DaTaobao Tech
Jun 21, 2024 · Big Data

Flink Real-Time Data Development: Cases on Data Skew, Watermark Failure, and GroupBy Issues

The article walks through three Flink streaming pitfalls—data‑skew‑induced back‑pressure, lost watermarks after interval joins, and ineffective group‑by causing duplicate rows—and shows how to resolve them with two‑stage distinct aggregation, hash‑based key distribution, processing‑time windows or split jobs, and mini‑batch buffering.

Data SkewFlinkReal-Time
0 likes · 14 min read
Flink Real-Time Data Development: Cases on Data Skew, Watermark Failure, and GroupBy Issues
DataFunTalk
DataFunTalk
Jun 1, 2024 · Big Data

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, covering the system architecture, streaming data quality assurance, flow‑batch integrated applications, and future data lake integration, while sharing technical details and operational insights for large‑scale data processing.

Data WarehouseFlinkreal-time data
0 likes · 16 min read
Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook
DataFunSummit
DataFunSummit
May 27, 2024 · Big Data

Design and Optimization of Zhihu's Bridge Platform for DMP/CDP: Architecture, Challenges, and Solutions

This article presents a comprehensive case study of Zhihu's Bridge platform, detailing its background, five core modules, unified architecture built on Spark and Flink, bitmap‑based tagging, and performance optimizations that address query speed, write latency, and high‑QPS online checks while outlining future directions with Doris 2.0 and large language models.

CDPDMPData Platform
0 likes · 27 min read
Design and Optimization of Zhihu's Bridge Platform for DMP/CDP: Architecture, Challenges, and Solutions
Big Data Technology & Architecture
Big Data Technology & Architecture
May 27, 2024 · Big Data

Athena Data Factory: A One‑Stop Data Development and Governance Platform – Architecture, Features, and Impact

The Athena Data Factory, built by Spark Thinking, is a comprehensive one‑stop data development and governance platform that integrates data integration, development, analysis, and services, offering offline, real‑time, and AI pipelines, modular architecture, extensive monitoring, and cost‑optimisation to empower thousands of users across the company.

AirflowBig DataData Platform
0 likes · 26 min read
Athena Data Factory: A One‑Stop Data Development and Governance Platform – Architecture, Features, and Impact
DataFunTalk
DataFunTalk
May 26, 2024 · Big Data

Athena Data Factory: A One‑Stop Data Development and Governance Platform for Sparkle Thinking

The article details how Sparkle Thinking built the Athena Data Factory—a comprehensive, self‑service data development and governance platform that integrates data integration, ETL, real‑time processing, monitoring, and analytics, describing its architecture, key technologies, implementation timeline, operational practices, performance gains, and future directions.

AirflowETLFlink
0 likes · 26 min read
Athena Data Factory: A One‑Stop Data Development and Governance Platform for Sparkle Thinking
DataFunTalk
DataFunTalk
May 16, 2024 · Big Data

Streaming Data Lake Warehouse Solution Based on USDP with Flink and Paimon

This article presents UCloud's USDP‑based streaming data lake warehouse solution that leverages Flink for real‑time processing and Paimon for lake storage, detailing its architecture, advantages, practical scenarios, and providing complete SQL and Flink CDC code snippets for end‑to‑end implementation.

CDCData LakeFlink
0 likes · 27 min read
Streaming Data Lake Warehouse Solution Based on USDP with Flink and Paimon
Big Data Technology & Architecture
Big Data Technology & Architecture
May 13, 2024 · Big Data

Apache Paimon 0.8 Release: Deletion Vectors, File Index, Performance Boosts, and Flink/Spark Integration Enhancements

The article introduces Apache Paimon 0.8, highlighting new Deletion Vectors, a universal file index, memory and I/O optimizations, record‑level TTL, and integration improvements with Flink and Spark, while also discussing broader lake‑house performance trends and future directions.

Apache PaimonBig DataDeletion Vectors
0 likes · 8 min read
Apache Paimon 0.8 Release: Deletion Vectors, File Index, Performance Boosts, and Flink/Spark Integration Enhancements
iQIYI Technical Product Team
iQIYI Technical Product Team
Apr 26, 2024 · Big Data

iQIYI Real-time Lakehouse: Stream‑Batch Unified Architecture

iQIYI replaced its costly Lambda architecture with a unified Iceberg‑based lakehouse that combines Flink streaming and batch processing, cutting data latency from hours to minutes, supporting thousands of tables via a multi‑table sink, guaranteeing completeness, and saving millions of RMB in operational costs.

Data LakeFlinkIceberg
0 likes · 18 min read
iQIYI Real-time Lakehouse: Stream‑Batch Unified Architecture
DataFunSummit
DataFunSummit
Apr 25, 2024 · Big Data

Paimon Project Overview: Recent Developments, Core Capabilities, and Future Roadmap

This article presents a comprehensive overview of the Apache‑incubated Paimon project, covering its evolution from Flink Table Store, the current features of primary‑key and log tables, management tools such as snapshots, tags and branches, performance optimizations for Flink and Spark, and a detailed roadmap of upcoming functionalities.

Big DataData ManagementFlink
0 likes · 23 min read
Paimon Project Overview: Recent Developments, Core Capabilities, and Future Roadmap
21CTO
21CTO
Apr 22, 2024 · Big Data

Inside Uber’s Real‑Time Data Infrastructure: How They Scale Streaming at Massive Scale

This article explores Uber’s sophisticated real‑time data infrastructure, detailing how the company leverages open‑source technologies such as Apache Kafka, Flink, Pinot, and Presto, and describing the architectural components, scaling challenges, multi‑region resilience, data back‑filling, and operational practices that enable low‑latency analytics for millions of daily rides and deliveries.

Big DataFlinkKafka
0 likes · 25 min read
Inside Uber’s Real‑Time Data Infrastructure: How They Scale Streaming at Massive Scale
Bilibili Tech
Bilibili Tech
Apr 9, 2024 · Big Data

Optimizing Flink State Performance with RocksDB KV Separation and BlobDB

In large‑scale Flink double‑stream joins, terabyte‑sized RocksDB state caused severe compaction latency and CPU spikes, but enabling RocksDB BlobDB KV‑separation (and an inner‑compaction patch) dramatically shrank SST files, reduced read/write latencies to sub‑millisecond levels, and cut CPU spikes by about half.

FlinkKV SeparationPerformance Optimization
0 likes · 12 min read
Optimizing Flink State Performance with RocksDB KV Separation and BlobDB
DataFunSummit
DataFunSummit
Apr 7, 2024 · Big Data

Li Auto’s Flink on Kubernetes Data Integration Practice

This article presents Li Auto’s end‑to‑end data integration journey, detailing the evolution of its data platform, the challenges of heterogeneous sources, and how a unified Flink‑on‑K8s solution with cloud‑native architecture, operator management, monitoring, and checkpointing addresses batch‑stream convergence and future scalability.

Batch ProcessingBig DataData Integration
0 likes · 12 min read
Li Auto’s Flink on Kubernetes Data Integration Practice
DataFunTalk
DataFunTalk
Mar 26, 2024 · Big Data

Building an Enterprise Real-Time Data Warehouse with Hologres and Flink at Cao Cao Mobility

This article presents a comprehensive case study of Cao Cao Mobility's transition from a traditional Lambda architecture to an enterprise‑grade real‑time data warehouse built on Hologres and Flink, detailing business background, pain points, architectural design, performance optimizations, metadata management, and future development directions.

Big DataFlinkHologres
0 likes · 20 min read
Building an Enterprise Real-Time Data Warehouse with Hologres and Flink at Cao Cao Mobility
HelloTech
HelloTech
Mar 21, 2024 · Big Data

Streaming Prediction System Construction and Real‑time Feature Templatization

The article describes how a Flink‑based streaming prediction platform was built to flatten peak request loads, reduce latency and memory use, and improve stability by deduplicating SDK calls, incrementally loading Hive features, partitioned caching, and comprehensive monitoring, while a templating system automates feature definition, SQL generation and stress testing, enabling real‑time supply‑demand forecasting that outperforms offline methods.

AIFlinkKafka
0 likes · 8 min read
Streaming Prediction System Construction and Real‑time Feature Templatization
Didi Tech
Didi Tech
Mar 12, 2024 · Big Data

Understanding Flink Metrics System: Core Concepts, Elastic Design, and Practical Usage

The article explains Flink’s metrics architecture—core concepts, reporter interfaces, built‑in and custom metric types, elastic plugin design, and scheduled reporting—illustrated with a consumption‑latency example, and shows how Didi uses these metrics for real‑time UI curves, alerts, and intelligent task diagnosis.

Big DataFlinkStreaming
0 likes · 11 min read
Understanding Flink Metrics System: Core Concepts, Elastic Design, and Practical Usage
Linux Code Review Hub
Linux Code Review Hub
Mar 11, 2024 · Databases

How Didi Built a Next‑Gen Log Storage System with ClickHouse

Didi migrated its massive PB‑scale log data from Elasticsearch to ClickHouse, redesigning storage with separate Log and Trace clusters, optimizing partition and sorting keys, introducing native TCP connectors, and revamping HDFS cold‑hot separation, achieving up to four‑fold query speed gains and 30% lower hardware costs.

ClickHouseDistributed SystemsFlink
0 likes · 15 min read
How Didi Built a Next‑Gen Log Storage System with ClickHouse
Open Source Linux
Open Source Linux
Mar 11, 2024 · Big Data

Step‑by‑Step Guide to Deploying Flink on Standalone, Yarn, and Kubernetes

This tutorial explains how to install and configure Apache Flink in three deployment modes—Standalone, Hadoop YARN, and Kubernetes—covering node preparation, configuration files, package distribution, job submission, and monitoring through the Flink Web UI, with full command‑line examples and code snippets.

Big DataFlinkKubernetes
0 likes · 12 min read
Step‑by‑Step Guide to Deploying Flink on Standalone, Yarn, and Kubernetes
DataFunSummit
DataFunSummit
Mar 4, 2024 · Big Data

Near Real-Time Metric System Architecture for Dongchedi Used Car Business

This article introduces Dongchedi's near real‑time metric system architecture, covering business background, technical challenges, the unified storage‑compute and query service design using the Las lakehouse built on Apache Hudi, solutions to consistency issues, achieved results, and future plans for further real‑time improvements.

Apache HudiFlinkReal-time analytics
0 likes · 13 min read
Near Real-Time Metric System Architecture for Dongchedi Used Car Business
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 4, 2024 · Big Data

Evolution of Flink State Storage and Compute‑Storage Separation Architecture

This article examines the evolution of Flink's state storage, discusses challenges posed by cloud‑native deployments, reviews recent community and Alibaba enhancements such as unaligned checkpoints, incremental snapshots, and the Gemini layered storage system, and proposes future directions for a compute‑storage separation architecture.

Distributed CheckpointFlinkGemini
0 likes · 18 min read
Evolution of Flink State Storage and Compute‑Storage Separation Architecture
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Mar 4, 2024 · Big Data

Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations

Xiaohongshu’s data‑warehouse team integrated Apache Iceberg‑based data‑lake techniques into its existing warehouse, replacing the legacy Hive/Spark stack with global sorting, Z‑order, and upsert‑enabled tables, which cut query latency by up to 90 %, boosted data freshness by 50 %, slashed storage costs by 83 % and saved tens of thousands of GB‑hours of compute daily.

Apache IcebergData LakeData Warehouse
0 likes · 19 min read
Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 1, 2024 · Big Data

Scaling U‑App Analytics to Billions of Events with Flink, MaxCompute & Hologres

UMeng+’s U‑App analytics platform processes nearly a trillion daily logs by combining real‑time Flink streams, offline MaxCompute batches, and Alibaba Cloud Hologres OLAP, employing multi‑engine architecture, smart sampling, and Roaring Bitmap techniques to deliver fast, cost‑effective, high‑concurrency user behavior and profiling analysis.

FlinkHologresMaxCompute
0 likes · 19 min read
Scaling U‑App Analytics to Billions of Events with Flink, MaxCompute & Hologres
DataFunTalk
DataFunTalk
Feb 27, 2024 · Big Data

Best Practices of Cloud‑Native OLAP Architecture and Logistics Warning at Jushuitan

This article presents Jushuitan's cloud‑native OLAP architecture, detailing its evolution, current big‑data stack—including DataWorks, MaxCompute, Flink, Hologres, and Aerospike—along with logistics warning workflows, rule‑matching mechanisms, real‑time processing challenges, and future scalability plans.

Big DataCloud NativeData Warehouse
0 likes · 20 min read
Best Practices of Cloud‑Native OLAP Architecture and Logistics Warning at Jushuitan
Alibaba Cloud Native
Alibaba Cloud Native
Feb 26, 2024 · Cloud Native

How to Structure Weakly Structured Logs with Flink SLS SPL

This guide explains how to use Alibaba Cloud's SLS connector and SPL expressions within Flink to clean, parse, and transform weakly structured log data into a structured table suitable for real‑time SQL analysis, covering sample logs, field extraction rules, and step‑by‑step configuration.

Data CleansingFlinkSLS
0 likes · 14 min read
How to Structure Weakly Structured Logs with Flink SLS SPL
DataFunTalk
DataFunTalk
Feb 22, 2024 · Big Data

Flink on Kubernetes: Kuaishou’s Practice, Migration, and Future Refactoring

This article details Kuaishou’s five‑year evolution of Flink, covering its background, production refactoring to Kubernetes, migration practices, and future improvements, highlighting architecture layers, resource management, observability, and testing strategies for large‑scale stream processing.

Big DataCloud NativeFlink
0 likes · 12 min read
Flink on Kubernetes: Kuaishou’s Practice, Migration, and Future Refactoring
DataFunSummit
DataFunSummit
Feb 20, 2024 · Big Data

BitSail Open‑Source Data Integration Engine: Architecture, New Features, CDC Solutions and Future Outlook

This article introduces ByteDance's open‑source data integration engine BitSail, covering its background, layered architecture, recent feature enhancements, automated testing framework, CDC‑based full‑library synchronization solutions, and future development plans for connectors and real‑time data consistency.

Big DataCDCData Integration
0 likes · 12 min read
BitSail Open‑Source Data Integration Engine: Architecture, New Features, CDC Solutions and Future Outlook
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 20, 2024 · Big Data

Understanding Stream‑Batch Integration in Modern Data Engineering

The article explains the rise, challenges, and practical approaches of the stream‑batch integration concept—originally popularized by the Flink community—highlighting why it struggles at large scale, how companies adopt Kappa‑style real‑time pipelines or unified storage‑compute engines, and its relevance in technical interviews.

FlinkKappa architecture
0 likes · 6 min read
Understanding Stream‑Batch Integration in Modern Data Engineering
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 20, 2024 · Big Data

Feishu ShenNuo's Real-Time Data Warehouse with Flink, Hudi, and Hologres

Feishu ShenNuo redesigned its data architecture by integrating Flink, Hudi, and Hologres to create a cloud‑native real‑time data warehouse that supports both millisecond‑level ad monitoring and minute‑level game operations, offering scalable storage, low‑latency queries, and comprehensive monitoring and capacity planning.

FlinkHologresHudi
0 likes · 16 min read
Feishu ShenNuo's Real-Time Data Warehouse with Flink, Hudi, and Hologres
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 18, 2024 · Big Data

Understanding Apache Paimon Table Modes and Their Use Cases

Apache Paimon provides multiple table modes—including primary key tables with fixed or dynamic buckets, Append scalable and queue tables—each with specific configurations, compaction behavior, and suitable scenarios, and the article explains their structures, performance considerations, and how to use them with Flink.

Apache PaimonAppend TableBig Data
0 likes · 12 min read
Understanding Apache Paimon Table Modes and Their Use Cases
DataFunSummit
DataFunSummit
Jan 25, 2024 · Big Data

Best Practices of Jushuitan Cloud‑Native OLAP Architecture and Logistics Warning

This article presents Jushuitan's cloud‑native OLAP architecture, covering business background, data‑warehouse evolution, real‑time processing with Flink, Hologres, and Aerospike, and detailed logistics‑warning use cases, followed by technical challenges, future outlook, and a Q&A on implementation details.

Big DataData WarehouseFlink
0 likes · 20 min read
Best Practices of Jushuitan Cloud‑Native OLAP Architecture and Logistics Warning
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Jan 10, 2024 · Operations

Building Cloud Music's APM Metric Monitoring System Based on VictoriaMetrics

Cloud Music’s middleware team built the Pylon APM monitoring system on VictoriaMetrics, combining exporters, vmagent, Nacos, Flink‑based pre‑aggregation recording rules and vminsert for collection with Grafana, a custom Proxy and vmselect for querying, achieving millisecond‑level latency, metric‑trace correlation, stability improvements, and cost‑effective storage for nearly 700 million active time series.

APM monitoringFlinkMetric Pre-aggregation
0 likes · 12 min read
Building Cloud Music's APM Metric Monitoring System Based on VictoriaMetrics
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 10, 2024 · Big Data

CaoCao Mobility's Real‑Time Data Warehouse: Hologres + Flink

This article details how CaoCao Mobility transformed its ride‑hailing platform by replacing a traditional Lambda architecture with an enterprise‑grade real‑time data warehouse built on Hologres and Flink, covering business motivations, architectural design, component capabilities, performance optimizations, operational safeguards, and future roadmap.

Data ArchitectureFlinkHologres
0 likes · 19 min read
CaoCao Mobility's Real‑Time Data Warehouse: Hologres + Flink
Sohu Tech Products
Sohu Tech Products
Dec 27, 2023 · Big Data

Practical Implementation of Data Integration with Flink on Kubernetes at Li Auto

Li Auto built a cloud‑native data‑integration platform by deploying Flink on Kubernetes, unifying batch and streaming workloads with a storage layer (JuiceFS + BOS) and Flink Operator, enabling simple source‑sink pipelines, elastic scaling, automated checkpointing, and centralized monitoring while addressing earlier fragmentation and resource inefficiencies.

Big DataCloud NativeData Integration
0 likes · 11 min read
Practical Implementation of Data Integration with Flink on Kubernetes at Li Auto
DataFunTalk
DataFunTalk
Dec 27, 2023 · Big Data

Amoro Mixed Hive: A Unified Lakehouse Solution for Real‑Time and Batch Data Processing

This article describes how NetEase Youdao replaced its Doris‑based real‑time data warehouse with Amoro Mixed Hive, detailing the architectural challenges, the Mixed Hive design, implementation steps, performance optimizations, community contributions, and future roadmap to achieve a unified lakehouse with minute‑level freshness and reduced development and operational costs.

AmoroBig DataFlink
0 likes · 12 min read
Amoro Mixed Hive: A Unified Lakehouse Solution for Real‑Time and Batch Data Processing
DataFunTalk
DataFunTalk
Dec 22, 2023 · Big Data

Practical Implementation of Flink on Kubernetes for Data Integration at Li Auto

This article details Li Auto's end‑to‑end data integration practice using Flink on Kubernetes, covering the evolution of their integration platform, architectural design, cloud‑native deployment, operational challenges, and future roadmap, while highlighting unified batch‑stream processing and resource elasticity.

Batch ProcessingBig DataCloud Native
0 likes · 12 min read
Practical Implementation of Flink on Kubernetes for Data Integration at Li Auto