Tagged articles

Flink

957 articles · Page 2 of 10

Jan 14, 2025 · Big Data

Tencent Real-Time Lakehouse Intelligent Optimization Practice

This presentation details Tencent's real‑time lakehouse architecture and the four key topics—lakehouse design, intelligent optimization services, scenario‑driven capabilities, and future outlook—covering components such as Spark, Flink, Iceberg, Auto‑Optimize Service, indexing, clustering, AutoEngine, and PyIceberg implementations.

Auto OptimizeBig DataFlink

0 likes · 12 min read

Tencent Real-Time Lakehouse Intelligent Optimization Practice

Alibaba Cloud Big Data AI Platform

Jan 14, 2025 · Big Data

How Fluss Unifies Lake and Stream for Real‑Time Analytics: Architecture, Benefits, and Future Roadmap

This article summarizes a talk by Alibaba Cloud senior engineer and Flink Committer Luo Yuxia on the challenges of separating lake and stream storage, introduces the Fluss lake‑stream unified architecture, explains its technical benefits such as second‑level data freshness, unified metadata, efficient changelog generation, and outlines future plans for broader ecosystem integration.

Data LakeFlinkFluss

0 likes · 13 min read

How Fluss Unifies Lake and Stream for Real‑Time Analytics: Architecture, Benefits, and Future Roadmap

ITPUB

Jan 7, 2025 · Databases

Cut Costs 25% and Boost Performance 70%: Retail Giant’s OceanBase Migration

The article details how WanJia Shuke, the tech arm of China Resources Vanguard, tackled retail system fragmentation, user‑experience degradation, complex linkages and scalability limits by migrating dozens of projects to the distributed OceanBase database, achieving up to 70% performance improvement, 25% cost reduction and streamlined operations.

FlinkOceanBaseRetail

0 likes · 15 min read

Cut Costs 25% and Boost Performance 70%: Retail Giant’s OceanBase Migration

Big Data Technology & Architecture

Jan 6, 2025 · Big Data

Ensuring Timeliness and Consistency in Apache Paimon: Snapshots, Expiration, and Optimization Strategies

This article explains how Apache Paimon guarantees data timeliness and consistency through snapshot files, two‑phase commit, and configurable expiration policies, and it outlines practical optimization and cleanup techniques for maintaining efficient storage and query performance.

Apache PaimonFlinkSnapshot

0 likes · 7 min read

Ensuring Timeliness and Consistency in Apache Paimon: Snapshots, Expiration, and Optimization Strategies

DataFunSummit

Jan 3, 2025 · Big Data

Tencent Real‑Time Lakehouse Intelligent Optimization Practices

This article presents Tencent's end‑to‑end real‑time lakehouse architecture, detailing its three‑layer design, the Auto Optimize Service modules such as compaction, indexing, clustering and engine acceleration, as well as scenario‑driven capabilities like multi‑stream joins, primary‑key tables, in‑place migration and PyIceberg support, and concludes with future optimization directions.

Big DataFlinkIceberg

0 likes · 11 min read

Tencent Real‑Time Lakehouse Intelligent Optimization Practices

Bilibili Tech

Jan 3, 2025 · Big Data

Evolution and Production Practices of Apache Celeborn Remote Shuffle Service at Bilibili

Bilibili replaced Spark’s unstable External Shuffle Service with a push‑based approach, then deployed Apache Celeborn’s remote shuffle on Kubernetes using HA masters, tiered workers, extensive monitoring, history‑based routing, chaos testing, and seamless Spark, Flink, and MapReduce integration, while planning self‑healing, elastic scaling, and priority‑aware I/O enhancements.

Apache CelebornBig DataFlink

0 likes · 28 min read

Evolution and Production Practices of Apache Celeborn Remote Shuffle Service at Bilibili

Big Data Technology & Architecture

Jan 2, 2025 · Big Data

Apache Paimon: Core Capabilities, Table Types, LSM Tree, Buckets, Merge Engines, and Operational Details

This article provides a comprehensive overview of Apache Paimon, covering its real‑time lake ingestion, unified stream‑batch processing, table types (primary‑key and append‑only), LSM‑tree storage, bucket mechanisms, merge‑engine options, compaction strategies, concurrency control, consumption methods, tag management, data cleanup, and system tables for big‑data workloads.

Apache PaimonBig DataFlink

0 likes · 25 min read

Apache Paimon: Core Capabilities, Table Types, LSM Tree, Buckets, Merge Engines, and Operational Details

DataFunSummit

Dec 27, 2024 · Big Data

Tencent Real-time Lakehouse Intelligent Optimization Practice

This presentation describes Tencent's real-time lakehouse architecture, including data lake compute, management, and storage layers, and details the intelligent optimization services—such as compaction, indexing, clustering, and auto-engine—designed to improve query performance, storage cost, and operational efficiency for large-scale data processing.

AutoEngineCompactionFlink

0 likes · 11 min read

Bilibili Tech

Dec 27, 2024 · Big Data

Consistency Architecture for Bilibili Recommendation Model Data Flow

The article outlines Bilibili’s revamped recommendation data‑flow architecture that eliminates timing and calculation inconsistencies by snapshotting online features, unifying feature computation in a single C++ library accessed via JNI, and orchestrating label‑join and sample extraction through near‑line Kafka/Flink pipelines, with further performance gains and Iceberg‑based future extensions.

Data ConsistencyFlinkIceberg

0 likes · 12 min read

Consistency Architecture for Bilibili Recommendation Model Data Flow

DaTaobao Tech

Dec 18, 2024 · Big Data

Incremental Computation in Big Data: Flink Materialized Table and Paimon

The article explains how Flink 1.20’s Materialized Table combined with Paimon’s changelog storage enables incremental computation that unifies batch and streaming workloads, delivering minute‑level latency at lower cost, illustrated by a materialized‑table example while noting current streaming‑only support and future batch extensions.

Big DataFlinkIncremental Computation

0 likes · 13 min read

Incremental Computation in Big Data: Flink Materialized Table and Paimon

Big Data Technology & Architecture

Dec 18, 2024 · Big Data

Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse

The article reviews the major directions of Flink 2.0—including compute‑storage separation, a new Materialized Table for unified batch‑stream processing, and deeper integration with Paimon for streaming warehouses—while offering a cautious perspective on their practical impact and migration challenges.

Batch-Stream IntegrationBig DataCompute-Storage Separation

0 likes · 5 min read

Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse

AntData

Dec 11, 2024 · Big Data

Flex: A Stream‑Batch Integrated Vectorized Engine for Flink

This article introduces Flex, a Flink‑compatible stream‑batch vectorized engine built on Velox and Gluten, explains the SIMD‑based execution model, details native operator optimizations, fallback mechanisms, correctness and usability improvements, and presents performance results and future development plans.

Distributed ComputingFlinkSIMD

0 likes · 17 min read

Flex: A Stream‑Batch Integrated Vectorized Engine for Flink

Alibaba Cloud Big Data AI Platform

Dec 9, 2024 · Big Data

Why Kafka Falls Short for Real‑Time Analytics and How Fluss Changes the Game

Flink Forward Asia 2024 highlighted the limitations of Kafka for real‑time analytics—lack of updates, poor data exploration, costly back‑tracking, and high network overhead—while introducing Fluss, a columnar streaming storage that offers low‑latency reads, CDC, lake‑stream integration, and efficient Delta Join for scalable, fast analytics.

Big DataDelta JoinFlink

0 likes · 15 min read

Why Kafka Falls Short for Real‑Time Analytics and How Fluss Changes the Game

Big Data Technology & Architecture

Dec 9, 2024 · Big Data

Understanding Flink’s Exactly-Once Semantics and Its Relation to Deduplication

This article explains what Flink’s Exactly‑Once semantics actually guarantee, why it does not mean each event is processed only once, how checkpointing and two‑phase commit sinks enable end‑to‑end exactly‑once, and the three safeguards needed for true exactly‑once computation.

Big DataDeduplicationExactly-once

0 likes · 5 min read

Understanding Flink’s Exactly-Once Semantics and Its Relation to Deduplication

DaTaobao Tech

Dec 6, 2024 · Big Data

How Paimon + Flink Enables Low‑Cost Real‑Time State Storage for Complex Streaming Jobs

This article explains how Apache Paimon can be used as a real‑time state store for Flink, detailing its low‑cost, scalable storage, lookup‑join design, table schema, bucket configuration, memory tuning, and practical use cases such as handling refund‑adjusted order tags and cumulative metrics.

Apache PaimonBig DataFlink

0 likes · 16 min read

How Paimon + Flink Enables Low‑Cost Real‑Time State Storage for Complex Streaming Jobs

StarRocks

Dec 2, 2024 · Big Data

How Paimon Revamps Lakehouse Management and Supercharges Queries with StarRocks

This article details Tongcheng Travel's migration from Hive/Kudu/Hudi to Paimon for lakehouse integration, highlighting a 30% resource reduction, three‑fold write speed gains, significant query acceleration via StarRocks, the end‑to‑end architecture across ODS‑DWD‑DWS‑ADS layers, and future roadmap plans.

Big DataFlinkLakehouse

0 likes · 18 min read

How Paimon Revamps Lakehouse Management and Supercharges Queries with StarRocks

Big Data Technology & Architecture

Dec 2, 2024 · Big Data

Optimizing Primary‑Key and Append‑Scalable Tables in Paimon with Flink

This guide explains how to optimize Paimon primary‑key and Append‑Scalable tables in Flink by adjusting sink and source parallelism, checkpoint intervals, making small‑file merges fully asynchronous, changing file formats, and applying ordering strategies to improve both write and read performance.

BatchBig DataFlink

0 likes · 6 min read

Optimizing Primary‑Key and Append‑Scalable Tables in Paimon with Flink

Alibaba Cloud Developer

Nov 29, 2024 · Big Data

Introducing Fluss: The Next‑Gen Real‑Time Stream Storage for Flink

Alibaba unveiled the open‑source Fluss project, a next‑generation real‑time stream storage built for Apache Flink that tackles traditional Kafka‑Flink limitations with millisecond‑level reads, columnar pruning, CDC support, and seamless Lakehouse integration, aiming to boost low‑latency analytics at scale.

Big DataFlinkopen source

0 likes · 6 min read

Introducing Fluss: The Next‑Gen Real‑Time Stream Storage for Flink

Tongcheng Travel Technology Center

Nov 27, 2024 · Big Data

Highlights of Tongcheng Travel’s 8th Big Data Technology Salon

The 8th Tongcheng Travel Big Data Technology Salon in Suzhou featured four expert talks covering Tencent Cloud’s Meson Spark engine, near‑line computing for travel itineraries, a Flink‑based real‑time risk control system, and Apache Paimon’s latest lake‑warehouse innovations, followed by a data‑driven business perspective session.

Apache PaimonBig DataData Lake

0 likes · 7 min read

Highlights of Tongcheng Travel’s 8th Big Data Technology Salon

Bilibili Tech

Nov 26, 2024 · Big Data

Bilibili’s Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practices

Bilibili migrated its massive user‑behavior, commercial AI training, and database synchronization pipelines from Hive and Kafka to an Iceberg‑based streaming‑batch architecture, using Flink and the Magnus optimizer to achieve minute‑level freshness, reduce CPU and memory usage by about 20‑22 %, save roughly 3.55 M CNY annually, and dramatically improve query latency and join performance.

BatchData IntegrationData Lake

0 likes · 20 min read

Bilibili’s Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practices

Big Data Technology & Architecture

Nov 26, 2024 · Big Data

Understanding Full GC, Data Skew, and Parallelism in Flink Tasks

This article explains how to monitor and interpret Full GC in Flink TaskManagers, detect and address data skew through proper data distribution and parallelism settings, and recommends aligning consumer parallelism with Kafka partitions, while also providing practical tips for using tools like Prometheus and Arthas.

Data SkewFlinkTaskManager

0 likes · 6 min read

Understanding Full GC, Data Skew, and Parallelism in Flink Tasks

Aikesheng Open Source Community

Nov 25, 2024 · Big Data

Real-time Data Synchronization from OceanBase to Kafka Using ActionOMS and Flink

This article demonstrates how to use ActionOMS to capture incremental changes from OceanBase, stream them to Kafka in various formats, and employ Flink to deduplicate and aggregate transaction data into a daily summary, illustrating a complete real-time data pipeline for financial use cases.

ActionOMSData synchronizationFlink

0 likes · 10 min read

Real-time Data Synchronization from OceanBase to Kafka Using ActionOMS and Flink

DataFunSummit

Nov 23, 2024 · Big Data

Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice

This article presents Bilibili's end‑to‑end exploration of a streaming‑batch unified data pipeline built on Apache Iceberg, detailing the original and iterated architectures for massive user behavior transmission, online AI training, DB synchronization, and dimension‑join, along with performance gains, cost savings, and future plans.

Batch ProcessingData LakeFlink

0 likes · 20 min read

Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice

Rare Earth Juejin Tech Community

Nov 18, 2024 · Cloud Native

Developing a Custom Kubernetes Controller for Flink Task Scheduling

This article provides a step‑by‑step guide to building a custom Kubernetes controller in Go that uses Prometheus metrics to intelligently schedule Flink TaskManager Pods, covering the underlying scheduler concepts, code implementation, Docker image creation, RBAC setup, deployment, testing, and advanced considerations.

Cloud NativeCustom SchedulerFlink

0 likes · 38 min read

Developing a Custom Kubernetes Controller for Flink Task Scheduling

Efficient Ops

Nov 7, 2024 · Operations

Automating Flink Task Deployment with Tekton, GitLab, and Serverless K8s

This guide details how to automate the full lifecycle of Flink tasks—including environment setup, integration, building, deployment, and task control—using GitLab, Tekton CI/CD, serverless containers on Alibaba Cloud, and Kubernetes, all orchestrated via Feishu cards.

AutomationCI/CDFlink

0 likes · 4 min read

Automating Flink Task Deployment with Tekton, GitLab, and Serverless K8s

JD Tech Talk

Nov 5, 2024 · Big Data

Low-Code Generation of Flink StreamGraph, JobGraph, and ExecutionGraph

This article explains how to generate Flink's StreamGraph, JobGraph, and ExecutionGraph using a low‑code canvas approach, detailing the underlying concepts, the transformation pipeline from DataStream to DAG, and providing Java code examples for building and assembling operators via drag‑and‑drop.

Big DataExecutionGraphFlink

0 likes · 5 min read

Low-Code Generation of Flink StreamGraph, JobGraph, and ExecutionGraph

JD Cloud Developers

Nov 5, 2024 · Big Data

Zero‑Code Flink: Build StreamGraph, JobGraph & ExecutionGraph via Canvas DAG

This article explains how Flink applications are transformed through StreamGraph, JobGraph, and ExecutionGraph stages, and presents a low‑code canvas approach that lets users assemble DAGs, persist them in a MySQL adjacency list, and generate zero‑code Flink programs using BFS traversal.

DAGFlinkLow-Code Development

0 likes · 6 min read

Zero‑Code Flink: Build StreamGraph, JobGraph & ExecutionGraph via Canvas DAG

Big Data Technology & Architecture

Nov 1, 2024 · Big Data

Real‑Time Lakehouse Architecture at Ximalaya Live: Leveraging Flink, Paimon, and StarRocks

This article details Ximalaya Live's transition from an offline‑centric data warehouse to a real‑time lakehouse using Flink, Paimon, and StarRocks, covering business background, architectural challenges, technology evaluation, implementation steps, encountered issues, performance gains, and future expansion plans.

FlinkLakehousePaimon

0 likes · 12 min read

Real‑Time Lakehouse Architecture at Ximalaya Live: Leveraging Flink, Paimon, and StarRocks

Big Data Technology & Architecture

Oct 28, 2024 · Big Data

Key Considerations for Using Paimon Primary Key Tables

This article explains the characteristics of Paimon primary key tables, covering bucket selection, cross‑partition update issues, recommended record‑level expiration settings, and two approaches to handle file compaction, including configuration tweaks and dedicated compaction tasks.

Big DataBucketCompaction

0 likes · 6 min read

Key Considerations for Using Paimon Primary Key Tables

DaTaobao Tech

Oct 25, 2024 · Big Data

Using Temporary Table JOIN in Flink SQL for Real-Time Stream Enrichment

The article explains how to use Flink SQL’s temporary table join to enrich a real‑time traffic‑log stream with versioned tag data, detailing the required DDL, the time‑versioned join syntax, and essential watermark and idle‑timeout settings that prevent stalls and boundary‑delay issues.

FlinkSQLTemporary Join

0 likes · 7 min read

Using Temporary Table JOIN in Flink SQL for Real-Time Stream Enrichment

Alibaba Cloud Big Data AI Platform

Oct 25, 2024 · Big Data

How Real-Time Flink Powers Automotive Big Data: Architecture & Case Studies

This article, based on Alibaba Cloud expert Li Lubing’s presentation, examines the rapid growth of China’s new energy vehicle market, outlines typical automotive big‑data architectures, compares Lambda and real‑time lakehouse solutions built with Flink and Apache Paimon, and showcases real‑world customer deployments.

AutomotiveBig DataCloud Computing

0 likes · 18 min read

How Real-Time Flink Powers Automotive Big Data: Architecture & Case Studies

DataFunSummit

Oct 24, 2024 · Big Data

Bilibili’s Large Language Model‑Based Intelligent Assistant for the Big Data Platform: Architecture, Principles, and Deployment

This article details Bilibili’s implementation of a large‑language‑model‑driven intelligent assistant for its massive big‑data platform, covering background, problem analysis, architectural design, knowledge‑base construction, precision and recall challenges, deployment across offline and real‑time Spark/Flink diagnostics, and future outlooks.

AgentBig DataFlink

0 likes · 23 min read

Bilibili’s Large Language Model‑Based Intelligent Assistant for the Big Data Platform: Architecture, Principles, and Deployment

Big Data Technology & Architecture

Oct 22, 2024 · Big Data

Key Frameworks and Characteristics of Lakehouse Architecture: A Ground‑Level Perspective

This article reviews the emerging lakehouse architecture, outlines its core frameworks such as Hudi, Iceberg, Paimon, Flink, and Doris, discusses their storage‑compute separation, read‑write optimizations, and highlights how companies of different sizes adopt these technologies based on cost, efficiency, and specific business scenarios.

Data ArchitectureDorisFlink

0 likes · 6 min read

Key Frameworks and Characteristics of Lakehouse Architecture: A Ground‑Level Perspective

JD Retail Technology

Oct 11, 2024 · Big Data

JD Retail Data Lake Architecture: Challenges, Optimizations, and Future Plans

This article presents JD Retail's data lake architecture overhaul, detailing the shortcomings of the Lambda model, the migration to Flink‑Hudi‑Spark pipelines, performance gains, storage savings, unified APIs, and upcoming improvements for resilience and automation.

Big DataData LakeFlink

0 likes · 11 min read

JD Retail Data Lake Architecture: Challenges, Optimizations, and Future Plans

Alibaba Cloud Big Data AI Platform

Sep 27, 2024 · Big Data

How Alibaba Cloud’s New Vectorized Engines Are Revolutionizing Real‑Time Big Data Processing

At the 2024 Cloud Xi Conference, Alibaba Cloud unveiled a suite of vectorized big‑data solutions—including the Flash engine for Flink, EMR Serverless Spark with a 300% speed boost, upgraded lakehouse architecture, and real‑world case studies—showcasing massive performance gains, cost reductions, and broader serverless adoption.

Big DataData LakeFlink

0 likes · 8 min read

How Alibaba Cloud’s New Vectorized Engines Are Revolutionizing Real‑Time Big Data Processing

dbaplus Community

Sep 23, 2024 · Operations

How Bilibili Scaled Monitoring: From Prometheus to a 2.0 VM‑Flink Architecture

Bilibili rebuilt its monitoring platform to handle explosive metric growth by separating collection, storage, and compute, adopting VictoriaMetrics, zone‑based scheduling, and Flink‑driven pre‑aggregation, which together improved stability, query performance, cloud data quality, and overall observability.

FlinkMonitoringObservability

0 likes · 31 min read

How Bilibili Scaled Monitoring: From Prometheus to a 2.0 VM‑Flink Architecture

StarRocks

Sep 19, 2024 · Big Data

How Ele.me Built a Real‑Time Lakehouse: From 1.0 to 3.0 with Flink, Paimon & StarRocks

This article details Ele.me's journey in evolving its real‑time data warehouse, covering the original 1.0 architecture, the 2.0 lakehouse redesign with Paimon and StarRocks, performance evaluations of lake formats and query engines, and the roadmap toward a 3.0 streaming lakehouse solution.

Big DataFlinkLakehouse

0 likes · 16 min read

How Ele.me Built a Real‑Time Lakehouse: From 1.0 to 3.0 with Flink, Paimon & StarRocks

Alibaba Cloud Big Data AI Platform

Sep 13, 2024 · Big Data

How Qimao Scales 20PB Data with StarRocks, Flink, and Real‑Time Analytics

Qimao, a Shanghai‑based cultural entertainment internet firm, details its 20 PB big‑data architecture built on StarRocks, Flink, Hive, and Redis, covering data ingestion, real‑time processing, audience selection, metric anomaly drill‑down, 730‑day aggregation, and future plans for metric acceleration and full‑link data governance.

Big DataData GovernanceData Warehouse

0 likes · 13 min read

How Qimao Scales 20PB Data with StarRocks, Flink, and Real‑Time Analytics

Architect

Sep 12, 2024 · Operations

How Bilibili Scaled Its Monitoring: From Prometheus OOMs to VictoriaMetrics & Flink Pre‑Aggregation

The article details Bilibili's evolution of its monitoring platform, describing the stability and performance challenges of a Prometheus‑Thanos stack, the redesign using VictoriaMetrics, collection‑storage separation, unit‑level disaster recovery, query‑tree auto‑replacement, Flink‑based pre‑aggregation, Grafana upgrades, and future roadmap for observability.

Cloud NativeFlinkMetrics

0 likes · 30 min read

How Bilibili Scaled Its Monitoring: From Prometheus OOMs to VictoriaMetrics & Flink Pre‑Aggregation

DataFunSummit

Sep 9, 2024 · Big Data

Exploring Real-Time Lakehouse Architecture with Apache Paimon

This article presents Xiaomi's real-time lakehouse architecture, outlines its current challenges, introduces Apache Paimon and several use‑case scenarios—including stream join optimization, streaming upserts, and lookup joins—while discussing expected benefits and future directions for a more efficient, unified data platform.

Apache PaimonFlinkIceberg

0 likes · 12 min read

Exploring Real-Time Lakehouse Architecture with Apache Paimon

ZhongAn Tech Team

Sep 3, 2024 · Big Data

Real-Time Log Clustering Architecture and Continuous Clustering Algorithm

This article presents a comprehensive overview of a log clustering system, detailing its background, architecture based on Filebeat, Kafka, Flink, Elasticsearch, and Grafana, and introduces a continuous clustering algorithm using SimHash and Hamming distance for real‑time log governance and anomaly detection.

FlinkLog ClusteringSimHash

0 likes · 14 min read

Real-Time Log Clustering Architecture and Continuous Clustering Algorithm

Big Data Technology & Architecture

Aug 26, 2024 · Big Data

Understanding Flink 1.11 JobManager and TaskManager Memory Configuration

This article details the major memory model changes in Flink 1.11 for JobManager and TaskManager, compares them with Flink 1.9, provides concrete JVM command examples, explains the relationship between memory settings and parallelism, and introduces fine‑grained resource management for streaming workloads.

Big DataFlinkJobManager

0 likes · 9 min read

Understanding Flink 1.11 JobManager and TaskManager Memory Configuration

StarRocks

Aug 14, 2024 · Big Data

Mastering StarRocks & Apache Paimon: A Fast‑Track Lakehouse Guide

This guide provides a comprehensive overview of Apache Paimon’s architecture, key features, and advantages, explains how to integrate it with StarRocks for real‑time lakehouse analytics, and walks through a complete quick‑start setup including component installation, Flink and Kafka deployment, data ingestion, table creation, and query execution with time‑travel support.

Apache PaimonData EngineeringFlink

0 likes · 18 min read

Mastering StarRocks & Apache Paimon: A Fast‑Track Lakehouse Guide

DataFunSummit

Aug 11, 2024 · Big Data

Real‑time Business Data Anomaly Attribution with Tugraph‑Analytics at Huolala

This article describes how Huolala leveraged the open‑source high‑performance streaming graph engine Tugraph‑Analytics together with Flink to build a real‑time business data anomaly detection and attribution system, detailing the background, architectural evolution, technical choices, implementation details, benefits, and future plans.

FlinkTuGraph-Analyticsgraph database

0 likes · 12 min read

Real‑time Business Data Anomaly Attribution with Tugraph‑Analytics at Huolala

ITPUB

Aug 11, 2024 · Operations

Scaling Bilibili’s Metrics Platform with VictoriaMetrics and Flink Pre‑aggregation

This article details how Bilibili redesigned its monitoring system to overcome explosive metric growth by separating collection and storage, adopting VictoriaMetrics, implementing zone‑based scheduling, automating PromQL query replacement, and using Flink for efficient pre‑aggregation, resulting in dramatically lower latency and higher stability.

FlinkMonitoringObservability

0 likes · 31 min read

Scaling Bilibili’s Metrics Platform with VictoriaMetrics and Flink Pre‑aggregation

DataFunTalk

Aug 10, 2024 · Big Data

Xiaomi Sales Data Warehouse: Construction Practices, Architecture, and Capability Evolution

This article presents a comprehensive overview of Xiaomi's sales data warehouse, detailing its development history, dimensional modeling theory, multi‑layer architecture, Lambda design with batch and streaming processing, capability layers, security measures, and answers to common technical questions.

Big DataData WarehouseFlink

0 likes · 15 min read

Xiaomi Sales Data Warehouse: Construction Practices, Architecture, and Capability Evolution

Bilibili Tech

Aug 9, 2024 · Operations

Design and Optimization of Monitoring 2.0 Architecture with VictoriaMetrics and Flink

The new Monitoring 2.0 architecture separates collection, compute and storage, adopts VictoriaMetrics for compact time‑series storage and a zone‑based scheduler, introduces push‑based ingestion, uses Flink for real‑time pre‑aggregation and automatic PromQL rewrite, delivering ten‑fold query speedups, sub‑300 ms p90 latency, and dramatically higher write and query throughput.

FlinkMetricsMonitoring

0 likes · 29 min read

Design and Optimization of Monitoring 2.0 Architecture with VictoriaMetrics and Flink

DataFunSummit

Aug 7, 2024 · Big Data

Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, detailing its architecture, data quality assurance, stream‑batch integration, and future data lake implementation, while highlighting the use of Flink, ODPS, and Paimon for scalable, low‑latency analytics.

Data QualityFlinkReal-time Data

0 likes · 15 min read

Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook

JD Tech Talk

Aug 6, 2024 · Big Data

Real-Time Stream Computation in Monitoring Systems: Data Streams, Windows, and Watermarks with Apache Flink

This article explains the role of monitoring systems, introduces real-time data stream computation, describes data stream characteristics, details Flink’s event time and processing time concepts, various window types, watermark mechanisms, and strategies for handling out-of-order and late data.

FlinkReal-timeWindow

0 likes · 18 min read

Real-Time Stream Computation in Monitoring Systems: Data Streams, Windows, and Watermarks with Apache Flink

JD Cloud Developers

Aug 6, 2024 · Big Data

Master Real-Time Stream Processing with Flink: Windows & Watermarks

This article provides a comprehensive overview of real-time stream processing, covering data streams, window types, event and processing time, Flink's operator model, watermark mechanisms, and strategies for handling out-of-order and late data to ensure accurate, timely analytics.

FlinkWatermarksWindowing

0 likes · 15 min read

Master Real-Time Stream Processing with Flink: Windows & Watermarks

JavaEdge

Aug 5, 2024 · Big Data

How to Handle Data Delay in Flink: Watermarks, Late Events, and Window Strategies

This article explains why out‑of‑order events cause delayed data in Flink, outlines their impact on computation accuracy and timeliness, identifies root causes such as network latency and watermark misconfiguration, and provides concrete watermark settings, allowed lateness, and step‑by‑step window‑triggering procedures with examples.

Data DelayFlinkWindow

0 likes · 8 min read

How to Handle Data Delay in Flink: Watermarks, Late Events, and Window Strategies

Big Data Technology & Architecture

Aug 3, 2024 · Big Data

Comprehensive Big Data Interview Questions and Topics

This article compiles a wide range of interview questions covering JVM garbage collection, Hadoop, Hive, Flink, HBase, data warehousing, real‑time processing, and HR topics, providing a thorough preparation guide for candidates targeting senior big‑data positions.

FlinkHadoopHive

0 likes · 9 min read

Comprehensive Big Data Interview Questions and Topics

Alibaba Cloud Big Data AI Platform

Aug 2, 2024 · Big Data

How Real-Time Computing Transforms Finance, Automotive, Logistics, and Retail

Businesses across finance, automotive, logistics, and retail are increasingly adopting real-time computing with Flink and Hologres to meet growing data volume and latency demands, enabling instant analytics, risk monitoring, dynamic recommendations, and efficient operations, while cloud architectures evolve to support massive, low‑latency data streams.

FlinkHologresReal-Time Computing

0 likes · 19 min read

How Real-Time Computing Transforms Finance, Automotive, Logistics, and Retail

DataFunTalk

Jul 25, 2024 · Big Data

Real‑time Data Warehouse Evolution with Data Lake: Challenges, Solutions, and Future Outlook

This article presents a comprehensive overview of JD Tech's real‑time data warehouse evolution, detailing the legacy Lambda architecture, its shortcomings, the integration of a data‑lake‑based solution, iterative redesigns, technical trade‑offs, and future directions for real‑time analytics.

ClickHouseFlinkHudi

0 likes · 25 min read

Real‑time Data Warehouse Evolution with Data Lake: Challenges, Solutions, and Future Outlook

DataFunTalk

Jul 18, 2024 · Big Data

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent exploration of real-time data warehouse architecture, covering its six-module design, data quality assurance mechanisms, stream‑batch unified processing with Flink and ODPS, and a forward‑looking data lake solution built on Paimon, offering practical insights for large‑scale streaming analytics.

Flinkstream processing

0 likes · 15 min read

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

Mike Chen's Internet Architecture

Jul 15, 2024 · Big Data

Master Distributed Computing: Hadoop, Spark, and Flink Explained

This article introduces the fundamentals of distributed computing, compares major frameworks such as Hadoop, Spark, and Flink, and outlines their key components, performance characteristics, and typical application scenarios including big‑data analytics, cloud services, real‑time streaming, and scientific computing.

Big DataDistributed ComputingFlink

0 likes · 7 min read

Master Distributed Computing: Hadoop, Spark, and Flink Explained

Alibaba Cloud Big Data AI Platform

Jul 12, 2024 · Big Data

How Flink + Hologres Power Real‑Time Streaming Warehouses

This article explains how combining Flink with Hologres creates a unified, real‑time streaming warehouse, detailing traditional layering approaches, the advantages of the Hologres‑based solution, core capabilities like Binlog and resource isolation, and a practical e‑commerce case study demonstrating performance gains.

Big DataFlinkHologres

0 likes · 21 min read

How Flink + Hologres Power Real‑Time Streaming Warehouses

DeWu Technology

Jul 5, 2024 · Databases

StarRocks 2.5.13 Cross-Cluster Upgrade and Data Migration Practices

The article outlines a cross‑cluster upgrade to StarRocks 2.5.13, evaluating resource and stability costs, and presents two migration schemes—using external tables and a Flink connector—along with planning, parallel execution, validation steps, and results showing successful migration of over 10 TB at 2 Gb/s across ten nodes, while noting future automation and CDC enhancements.

Data MigrationExternal TableFlink

0 likes · 15 min read

StarRocks 2.5.13 Cross-Cluster Upgrade and Data Migration Practices

Volcano Engine Developer Services

Jul 3, 2024 · Backend Development

How We Scaled a Billion‑Item Search Engine with Elasticsearch: From Zero to One

This article details the practical journey of building and scaling an Elasticsearch‑based search system that supports tens of millions to billions of items, covering architecture design, capacity planning, multi‑data‑center deployment, data synchronization via RocketMQ and Flink, and multi‑layer reconciliation to ensure consistency and high QPS.

ElasticsearchFlinkRocketMQ

0 likes · 15 min read

How We Scaled a Billion‑Item Search Engine with Elasticsearch: From Zero to One

JD Cloud Developers

Jul 3, 2024 · Big Data

How to Build a High‑Availability Real‑Time Logistics Dashboard with Flink and ClickHouse

This article details the design and implementation of a high‑availability, real‑time logistics supply‑chain dashboard, covering Flink‑based data pipelines, ClickHouse OLAP storage, metric consistency, stability measures, extensible configuration, and comprehensive monitoring to ensure accurate, scalable performance during major promotions.

Big DataClickHouseFlink

0 likes · 9 min read

How to Build a High‑Availability Real‑Time Logistics Dashboard with Flink and ClickHouse

JD Tech Talk

Jul 3, 2024 · Big Data

Real-time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Processing, and Stability Practices

This article describes the design and implementation of a high‑availability, real‑time logistics supply‑chain dashboard using Flink and ClickHouse, covering data processing pipelines, metric consistency, stability mechanisms, extensible configurations, and monitoring techniques to guide similar large‑screen projects.

ClickHouseFlinkStability

0 likes · 9 min read

Real-time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Processing, and Stability Practices

JD Tech

Jul 2, 2024 · Big Data

Real‑Time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Modeling, and Stability Design

This article presents the design and implementation of a high‑availability, real‑time logistics supply‑chain monitoring dashboard, covering its data processing pipeline with Flink, storage choices between Elasticsearch and ClickHouse, multi‑layer architecture, metric consistency, stability mechanisms, extensibility configurations, and monitoring practices.

Big DataClickHouseElasticsearch

0 likes · 11 min read

Real‑Time Monitoring Dashboard for Logistics Supply Chain: Architecture, Data Modeling, and Stability Design

DataFunSummit

Jul 1, 2024 · Big Data

Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks

This article details JD Retail's transition from a complex Lambda architecture to a unified real‑time data pipeline using Flink, Hudi, and StarRocks, addressing data completeness versus latency, reducing maintenance costs, improving storage efficiency, and delivering faster, more consistent analytics for business users.

Data WarehouseFlinkHudi

0 likes · 13 min read

Optimizing JD Retail Data Architecture: From Lambda to Real‑time Unified Processing with Flink, Hudi, and StarRocks

WeiLi Technology Team

Jun 28, 2024 · Big Data

How to Build a Robust Big Data Monitoring and Alerting System

This article explains why high‑availability design and comprehensive monitoring are essential for modern big‑data platforms, outlines a layered architecture, and provides practical guidance on health checks, alerting, and data‑quality monitoring across storage, compute, scheduling, and service layers.

FlinkHDFSarchitecture

0 likes · 14 min read

How to Build a Robust Big Data Monitoring and Alerting System

Alibaba Cloud Big Data AI Platform

Jun 25, 2024 · Big Data

Build Real-Time Data Lake Analytics with Flink, Paimon, and EMR Serverless Spark

This guide demonstrates how to use Alibaba Cloud's EMR Serverless Spark and Flink Serverless services together with Apache Paimon to ingest streaming data, perform interactive queries, and schedule offline compaction jobs, creating a unified real‑time and batch data lake solution.

Big DataData LakeEMR Serverless

0 likes · 6 min read

Build Real-Time Data Lake Analytics with Flink, Paimon, and EMR Serverless Spark

DaTaobao Tech

Jun 21, 2024 · Big Data

Flink Real-Time Data Development: Cases on Data Skew, Watermark Failure, and GroupBy Issues

The article walks through three Flink streaming pitfalls—data‑skew‑induced back‑pressure, lost watermarks after interval joins, and ineffective group‑by causing duplicate rows—and shows how to resolve them with two‑stage distinct aggregation, hash‑based key distribution, processing‑time windows or split jobs, and mini‑batch buffering.

Data SkewFlinkOptimization

0 likes · 14 min read

Flink Real-Time Data Development: Cases on Data Skew, Watermark Failure, and GroupBy Issues

DataFunTalk

Jun 1, 2024 · Big Data

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, covering the system architecture, streaming data quality assurance, flow‑batch integrated applications, and future data lake integration, while sharing technical details and operational insights for large‑scale data processing.

Data WarehouseFlinkReal-time Data

0 likes · 16 min read

DataFunSummit

May 27, 2024 · Big Data

Design and Optimization of Zhihu's Bridge Platform for DMP/CDP: Architecture, Challenges, and Solutions

This article presents a comprehensive case study of Zhihu's Bridge platform, detailing its background, five core modules, unified architecture built on Spark and Flink, bitmap‑based tagging, and performance optimizations that address query speed, write latency, and high‑QPS online checks while outlining future directions with Doris 2.0 and large language models.

CDPDMPData Platform

0 likes · 27 min read

Design and Optimization of Zhihu's Bridge Platform for DMP/CDP: Architecture, Challenges, and Solutions

Big Data Technology & Architecture

May 27, 2024 · Big Data

Athena Data Factory: A One‑Stop Data Development and Governance Platform – Architecture, Features, and Impact

The Athena Data Factory, built by Spark Thinking, is a comprehensive one‑stop data development and governance platform that integrates data integration, development, analysis, and services, offering offline, real‑time, and AI pipelines, modular architecture, extensive monitoring, and cost‑optimisation to empower thousands of users across the company.

AirflowBig DataCloud Computing

0 likes · 26 min read

Athena Data Factory: A One‑Stop Data Development and Governance Platform – Architecture, Features, and Impact

DataFunTalk

May 26, 2024 · Big Data

Athena Data Factory: A One‑Stop Data Development and Governance Platform for Sparkle Thinking

The article details how Sparkle Thinking built the Athena Data Factory—a comprehensive, self‑service data development and governance platform that integrates data integration, ETL, real‑time processing, monitoring, and analytics, describing its architecture, key technologies, implementation timeline, operational practices, performance gains, and future directions.

AirflowETLFlink

0 likes · 26 min read

Athena Data Factory: A One‑Stop Data Development and Governance Platform for Sparkle Thinking

DataFunTalk

May 16, 2024 · Big Data

Streaming Data Lake Warehouse Solution Based on USDP with Flink and Paimon

This article presents UCloud's USDP‑based streaming data lake warehouse solution that leverages Flink for real‑time processing and Paimon for lake storage, detailing its architecture, advantages, practical scenarios, and providing complete SQL and Flink CDC code snippets for end‑to‑end implementation.

CDCData LakeFlink

0 likes · 27 min read

Streaming Data Lake Warehouse Solution Based on USDP with Flink and Paimon

DataFunSummit

May 15, 2024 · Big Data

Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Evolution

This article details Xiaomi's sales data warehouse development, covering its history, architecture, dimensional modeling, layer design, streaming‑batch integration, governance, security, and future directions, while also addressing practical Q&A on implementation challenges and best practices.

Big DataData WarehouseFlink

0 likes · 15 min read

Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Evolution

Big Data Technology & Architecture

May 13, 2024 · Big Data

Apache Paimon 0.8 Release: Deletion Vectors, File Index, Performance Boosts, and Flink/Spark Integration Enhancements

The article introduces Apache Paimon 0.8, highlighting new Deletion Vectors, a universal file index, memory and I/O optimizations, record‑level TTL, and integration improvements with Flink and Spark, while also discussing broader lake‑house performance trends and future directions.

Apache PaimonBig DataDeletion Vectors

0 likes · 8 min read

Apache Paimon 0.8 Release: Deletion Vectors, File Index, Performance Boosts, and Flink/Spark Integration Enhancements

iQIYI Technical Product Team

Apr 26, 2024 · Big Data

iQIYI Real-time Lakehouse: Stream‑Batch Unified Architecture

iQIYI replaced its costly Lambda architecture with a unified Iceberg‑based lakehouse that combines Flink streaming and batch processing, cutting data latency from hours to minutes, supporting thousands of tables via a multi‑table sink, guaranteeing completeness, and saving millions of RMB in operational costs.

Data LakeFlinkIceberg

0 likes · 18 min read

iQIYI Real-time Lakehouse: Stream‑Batch Unified Architecture

DataFunSummit

Apr 25, 2024 · Big Data

Paimon Project Overview: Recent Developments, Core Capabilities, and Future Roadmap

This article presents a comprehensive overview of the Apache‑incubated Paimon project, covering its evolution from Flink Table Store, the current features of primary‑key and log tables, management tools such as snapshots, tags and branches, performance optimizations for Flink and Spark, and a detailed roadmap of upcoming functionalities.

Big DataData ManagementFlink

0 likes · 23 min read

Paimon Project Overview: Recent Developments, Core Capabilities, and Future Roadmap

21CTO

Apr 22, 2024 · Big Data

Inside Uber’s Real‑Time Data Infrastructure: How They Scale Streaming at Massive Scale

This article explores Uber’s sophisticated real‑time data infrastructure, detailing how the company leverages open‑source technologies such as Apache Kafka, Flink, Pinot, and Presto, and describing the architectural components, scaling challenges, multi‑region resilience, data back‑filling, and operational practices that enable low‑latency analytics for millions of daily rides and deliveries.

Big DataFlinkPinot

0 likes · 25 min read

Inside Uber’s Real‑Time Data Infrastructure: How They Scale Streaming at Massive Scale

DataFunSummit

Apr 18, 2024 · Big Data

Real‑time Data Warehouse Evolution with Data Lake: Architecture, Challenges, and Solutions

This article presents a comprehensive overview of JD Tech's real‑time data warehouse evolution, detailing the legacy Lambda‑based design, its shortcomings, the transition to a data‑lake‑integrated architecture, iterative improvements, encountered technical and non‑technical issues, and future outlooks.

ClickHouseData LakeFlink

0 likes · 24 min read

Real‑time Data Warehouse Evolution with Data Lake: Architecture, Challenges, and Solutions

Bilibili Tech

Apr 9, 2024 · Big Data

Optimizing Flink State Performance with RocksDB KV Separation and BlobDB

In large‑scale Flink double‑stream joins, terabyte‑sized RocksDB state caused severe compaction latency and CPU spikes, but enabling RocksDB BlobDB KV‑separation (and an inner‑compaction patch) dramatically shrank SST files, reduced read/write latencies to sub‑millisecond levels, and cut CPU spikes by about half.

FlinkKV SeparationPerformance Optimization

0 likes · 12 min read

Optimizing Flink State Performance with RocksDB KV Separation and BlobDB

Spring Full-Stack Practical Cases

Apr 9, 2024 · Big Data

Build Real-Time MySQL CDC Pipelines with Flink 1.19 and SpringBoot

This guide walks through setting up Flink CDC with MySQL on SpringBoot 2.7, covering binlog configuration, Maven dependencies, Java implementation for real‑time change capture, startup options, a custom Redis sink, and a web UI for monitoring the streaming pipeline.

CDCFlinkJava

0 likes · 10 min read

Build Real-Time MySQL CDC Pipelines with Flink 1.19 and SpringBoot

DataFunSummit

Apr 8, 2024 · Big Data

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, covering its modular architecture, data quality assurance mechanisms, stream‑batch integration techniques, graph‑based conversion attribution, and future data‑lake implementation using Paimon.

Flinkstream processing

0 likes · 15 min read

DataFunSummit

Apr 7, 2024 · Big Data

Li Auto’s Flink on Kubernetes Data Integration Practice

This article presents Li Auto’s end‑to‑end data integration journey, detailing the evolution of its data platform, the challenges of heterogeneous sources, and how a unified Flink‑on‑K8s solution with cloud‑native architecture, operator management, monitoring, and checkpointing addresses batch‑stream convergence and future scalability.

Batch ProcessingBig DataData Integration

0 likes · 12 min read

Li Auto’s Flink on Kubernetes Data Integration Practice

DataFunTalk

Mar 26, 2024 · Big Data

Building an Enterprise Real-Time Data Warehouse with Hologres and Flink at Cao Cao Mobility

This article presents a comprehensive case study of Cao Cao Mobility's transition from a traditional Lambda architecture to an enterprise‑grade real‑time data warehouse built on Hologres and Flink, detailing business background, pain points, architectural design, performance optimizations, metadata management, and future development directions.

Big DataData EngineeringFlink

0 likes · 20 min read

Building an Enterprise Real-Time Data Warehouse with Hologres and Flink at Cao Cao Mobility

HelloTech

Mar 21, 2024 · Big Data

Streaming Prediction System Construction and Real‑time Feature Templatization

The article describes how a Flink‑based streaming prediction platform was built to flatten peak request loads, reduce latency and memory use, and improve stability by deduplicating SDK calls, incrementally loading Hive features, partitioned caching, and comprehensive monitoring, while a templating system automates feature definition, SQL generation and stress testing, enabling real‑time supply‑demand forecasting that outperforms offline methods.

AIFlinkStreaming Prediction

0 likes · 8 min read

Streaming Prediction System Construction and Real‑time Feature Templatization

Big Data Technology & Architecture

Mar 20, 2024 · Big Data

Flink 1.19 New Features: SQL Optimizations, Runtime Enhancements, and Checkpointing Improvements

The article reviews Flink 1.19’s new features, highlighting SQL capability enhancements such as custom source parallelism, TTL hints, and MiniBatch support for regular joins, as well as runtime dynamic parallelism for batch jobs and flexible checkpointing intervals for different data sources.

Big DataFlinkSQL

0 likes · 6 min read

Flink 1.19 New Features: SQL Optimizations, Runtime Enhancements, and Checkpointing Improvements

DataFunSummit

Mar 17, 2024 · Big Data

OPPO Smart Data Lakehouse: Architecture, Real‑time Lakehouse, and Technical Practices

This article presents OPPO's smart data lakehouse solution, describing its massive EB‑scale architecture, the integration of batch and streaming engines, the Glacier service for table management, schema‑adaptive ingestion, performance optimizations, and future technical road‑maps for unified data processing.

Big DataData LakehouseFlink

0 likes · 15 min read

OPPO Smart Data Lakehouse: Architecture, Real‑time Lakehouse, and Technical Practices

Didi Tech

Mar 12, 2024 · Big Data

Understanding Flink Metrics System: Core Concepts, Elastic Design, and Practical Usage

The article explains Flink’s metrics architecture—core concepts, reporter interfaces, built‑in and custom metric types, elastic plugin design, and scheduled reporting—illustrated with a consumption‑latency example, and shows how Didi uses these metrics for real‑time UI curves, alerts, and intelligent task diagnosis.

Big DataFlinkMetrics

0 likes · 11 min read

Understanding Flink Metrics System: Core Concepts, Elastic Design, and Practical Usage

Linux Code Review Hub

Mar 11, 2024 · Databases

How Didi Built a Next‑Gen Log Storage System with ClickHouse

Didi migrated its massive PB‑scale log data from Elasticsearch to ClickHouse, redesigning storage with separate Log and Trace clusters, optimizing partition and sorting keys, introducing native TCP connectors, and revamping HDFS cold‑hot separation, achieving up to four‑fold query speed gains and 30% lower hardware costs.

ClickHouseFlinkHDFS

0 likes · 15 min read

How Didi Built a Next‑Gen Log Storage System with ClickHouse

Open Source Linux

Mar 11, 2024 · Big Data

Step‑by‑Step Guide to Deploying Flink on Standalone, Yarn, and Kubernetes

This tutorial explains how to install and configure Apache Flink in three deployment modes—Standalone, Hadoop YARN, and Kubernetes—covering node preparation, configuration files, package distribution, job submission, and monitoring through the Flink Web UI, with full command‑line examples and code snippets.

Big DataFlinkStandalone

0 likes · 12 min read

Step‑by‑Step Guide to Deploying Flink on Standalone, Yarn, and Kubernetes

JavaEdge

Mar 6, 2024 · Backend Development

Designing a Near Real-Time Video Recommendation Engine with ES and Adaptive Rate Limiting

This article details the design of a low‑latency video recommendation system, covering requirements, overall architecture, data pipelines, consistency handling, write smoothing, and performance optimizations such as multi‑level caching and Elasticsearch tuning.

CachingElasticsearchFlink

0 likes · 13 min read

Designing a Near Real-Time Video Recommendation Engine with ES and Adaptive Rate Limiting

Alibaba Cloud Native

Mar 6, 2024 · Cloud Native

How to Use SLS SPL for Efficient Data Filtering and Projection in Alibaba Cloud Flink

This guide explains how to leverage Alibaba Cloud Log Service (SLS) SPL within the Flink SLS Connector to push down row filtering and column projection, reducing network traffic and compute load while enabling real‑time log analytics using Flink SQL.

Data FilteringFlinkSLS

0 likes · 13 min read

How to Use SLS SPL for Efficient Data Filtering and Projection in Alibaba Cloud Flink

DataFunSummit

Mar 4, 2024 · Big Data

Near Real-Time Metric System Architecture for Dongchedi Used Car Business

This article introduces Dongchedi's near real‑time metric system architecture, covering business background, technical challenges, the unified storage‑compute and query service design using the Las lakehouse built on Apache Hudi, solutions to consistency issues, achieved results, and future plans for further real‑time improvements.

Apache HudiFlinkreal-time analytics

0 likes · 13 min read

Near Real-Time Metric System Architecture for Dongchedi Used Car Business

Big Data Technology & Architecture

Mar 4, 2024 · Big Data

Evolution of Flink State Storage and Compute‑Storage Separation Architecture

This article examines the evolution of Flink's state storage, discusses challenges posed by cloud‑native deployments, reviews recent community and Alibaba enhancements such as unaligned checkpoints, incremental snapshots, and the Gemini layered storage system, and proposes future directions for a compute‑storage separation architecture.

Distributed CheckpointFlinkGemini

0 likes · 18 min read

Evolution of Flink State Storage and Compute‑Storage Separation Architecture

Xiaohongshu Tech REDtech

Mar 4, 2024 · Big Data

Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations

Xiaohongshu’s data‑warehouse team integrated Apache Iceberg‑based data‑lake techniques into its existing warehouse, replacing the legacy Hive/Spark stack with global sorting, Z‑order, and upsert‑enabled tables, which cut query latency by up to 90 %, boosted data freshness by 50 %, slashed storage costs by 83 % and saved tens of thousands of GB‑hours of compute daily.

Apache IcebergData LakeData Warehouse

0 likes · 19 min read

Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations

Alibaba Cloud Big Data AI Platform

Mar 1, 2024 · Big Data

Scaling U‑App Analytics to Billions of Events with Flink, MaxCompute & Hologres

UMeng+’s U‑App analytics platform processes nearly a trillion daily logs by combining real‑time Flink streams, offline MaxCompute batches, and Alibaba Cloud Hologres OLAP, employing multi‑engine architecture, smart sampling, and Roaring Bitmap techniques to deliver fast, cost‑effective, high‑concurrency user behavior and profiling analysis.

FlinkHologresMaxCompute

0 likes · 19 min read

Scaling U‑App Analytics to Billions of Events with Flink, MaxCompute & Hologres

DataFunTalk

Feb 27, 2024 · Big Data

Best Practices of Cloud‑Native OLAP Architecture and Logistics Warning at Jushuitan

This article presents Jushuitan's cloud‑native OLAP architecture, detailing its evolution, current big‑data stack—including DataWorks, MaxCompute, Flink, Hologres, and Aerospike—along with logistics warning workflows, rule‑matching mechanisms, real‑time processing challenges, and future scalability plans.

Big DataCloud NativeData Warehouse

0 likes · 20 min read

Best Practices of Cloud‑Native OLAP Architecture and Logistics Warning at Jushuitan

Alibaba Cloud Native

Feb 26, 2024 · Cloud Native

How to Structure Weakly Structured Logs with Flink SLS SPL

This guide explains how to use Alibaba Cloud's SLS connector and SPL expressions within Flink to clean, parse, and transform weakly structured log data into a structured table suitable for real‑time SQL analysis, covering sample logs, field extraction rules, and step‑by‑step configuration.

Data CleansingFlinkSLS

0 likes · 14 min read

How to Structure Weakly Structured Logs with Flink SLS SPL

DataFunTalk

Feb 22, 2024 · Big Data

Flink on Kubernetes: Kuaishou’s Practice, Migration, and Future Refactoring

This article details Kuaishou’s five‑year evolution of Flink, covering its background, production refactoring to Kubernetes, migration practices, and future improvements, highlighting architecture layers, resource management, observability, and testing strategies for large‑scale stream processing.

Big DataCloud NativeFlink

0 likes · 12 min read

Flink on Kubernetes: Kuaishou’s Practice, Migration, and Future Refactoring

DataFunSummit

Feb 20, 2024 · Big Data

BitSail Open‑Source Data Integration Engine: Architecture, New Features, CDC Solutions and Future Outlook

This article introduces ByteDance's open‑source data integration engine BitSail, covering its background, layered architecture, recent feature enhancements, automated testing framework, CDC‑based full‑library synchronization solutions, and future development plans for connectors and real‑time data consistency.

Big DataCDCData Integration

0 likes · 12 min read

BitSail Open‑Source Data Integration Engine: Architecture, New Features, CDC Solutions and Future Outlook

Big Data Technology & Architecture

Feb 20, 2024 · Big Data

Understanding Stream‑Batch Integration in Modern Data Engineering

The article explains the rise, challenges, and practical approaches of the stream‑batch integration concept—originally popularized by the Flink community—highlighting why it struggles at large scale, how companies adopt Kappa‑style real‑time pipelines or unified storage‑compute engines, and its relevance in technical interviews.

FlinkKappa architecture

0 likes · 6 min read

Understanding Stream‑Batch Integration in Modern Data Engineering

Alibaba Cloud Big Data AI Platform

Feb 20, 2024 · Big Data

Feishu ShenNuo's Real-Time Data Warehouse with Flink, Hudi, and Hologres

Feishu ShenNuo redesigned its data architecture by integrating Flink, Hudi, and Hologres to create a cloud‑native real‑time data warehouse that supports both millisecond‑level ad monitoring and minute‑level game operations, offering scalable storage, low‑latency queries, and comprehensive monitoring and capacity planning.

FlinkHologresHudi

0 likes · 16 min read

Feishu ShenNuo's Real-Time Data Warehouse with Flink, Hudi, and Hologres