Tagged articles
946 articles
Page 1 of 10
DataFunTalk
DataFunTalk
May 11, 2026 · Big Data

How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based ad‑hoc analysis to a Lambda‑style architecture and finally to a lakehouse built on Iceberg, StarRocks, Flink and Spark, cutting architecture complexity, resource and development costs by two‑thirds while supporting trillions of daily events with sub‑second query latency.

Big DataClickHouseFlink
0 likes · 22 min read
How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era
DataFunTalk
DataFunTalk
May 6, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

The article details Xiaohongshu's four‑stage data‑platform evolution—from a simple ClickHouse ad‑hoc setup to a Lambda‑based 2.0 design and finally a lakehouse‑driven 3.0 architecture—highlighting the adoption of general incremental compute, cost‑reduction to one‑third, performance gains of up to ten‑fold, and the SPOT standards that guide the new system.

Big DataClickHouseData Architecture
0 likes · 21 min read
How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era
DataFunTalk
DataFunTalk
Apr 29, 2026 · Big Data

How Xiaohongshu Revamped Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based analytics stack to a unified lakehouse with generic incremental compute, cutting architecture complexity, resource cost, and development effort by roughly one‑third while supporting petabyte‑scale, sub‑second queries across its 350 million‑user app.

Big DataClickHouseData Architecture
0 likes · 22 min read
How Xiaohongshu Revamped Its Data Architecture for the Big AI Data Era
Lao Guo's Learning Space
Lao Guo's Learning Space
Apr 29, 2026 · Big Data

Designing a Full-Stack Credit Data System: From Ingestion to Real-Time Decision

The article dissects a credit data system architecture, detailing six logical layers—from multi-source data collection and feature engineering (including graph features and feature stores) to model training, real‑time stream processing, decision engine integration, and privacy‑preserving computation—while explaining the trade‑offs, tools, and performance targets needed for accurate, low‑latency risk assessment.

Credit ScoringFeature StoreFlink
0 likes · 16 min read
Designing a Full-Stack Credit Data System: From Ingestion to Real-Time Decision
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 27, 2026 · Information Security

Real-Time Agentic Risk Detection with Flink, Fluss, and Large Language Models

The article presents a Flink‑Fluss‑LLM architecture that captures full‑link agent events via a non‑intrusive hook, combines semantic AI inference with deterministic CEP rules, and delivers millisecond‑level alerts for malicious user detection, tool result poisoning, and chain‑attack risk mitigation.

AI FunctionAgent SecurityFlink
0 likes · 41 min read
Real-Time Agentic Risk Detection with Flink, Fluss, and Large Language Models
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 24, 2026 · Artificial Intelligence

A Deep Dive into Flink Agents: Architecture, Roadmap, and Upcoming Features

The article explains Flink Agents' current 0.3 preview, detailing its layered architecture—from Agent definition to execution plan and runtime operators—while outlining the roadmap for Skills integration, Mem0 long‑term memory, durable execution, and observability enhancements aimed at production readiness.

AI agentsAgentPlanFlink
0 likes · 7 min read
A Deep Dive into Flink Agents: Architecture, Roadmap, and Upcoming Features
Lobster Programming
Lobster Programming
Apr 8, 2026 · Big Data

How to Implement Real‑Time API Traffic Counting at Scale

This article compares three practical approaches—direct database storage, a Flink‑Kafka‑Redis‑Grafana pipeline, and an ELK stack—to achieve real‑time API request counting for high‑concurrency scenarios, outlining their architectures, advantages, and trade‑offs.

API analyticsELKFlink
0 likes · 6 min read
How to Implement Real‑Time API Traffic Counting at Scale
Alibaba Cloud Observability
Alibaba Cloud Observability
Apr 6, 2026 · Cloud Native

How Alibaba Cloud Built Real‑Time OpenAPI Monitoring with Flink + SLS

This article details the design and implementation of a cloud‑native, real‑time monitoring system for Alibaba Cloud OpenAPI, covering background challenges, a Flink‑SLS architecture, multi‑region data processing, checkpoint and state‑backend tuning, source‑side predicate pushdown, visualization with Grafana, and production results.

Big DataCloud NativeFlink
0 likes · 21 min read
How Alibaba Cloud Built Real‑Time OpenAPI Monitoring with Flink + SLS
Big Data Tech Team
Big Data Tech Team
Apr 1, 2026 · Big Data

Why Your 2026 Big Data Resume Is Being Ignored and How to Fix It

In the 2026 spring hiring season, many big‑data job seekers see their resumes disappear because they still focus on offline batch processing, while employers now demand real‑time streaming, AI‑driven data pipelines, and cloud‑native deployment skills such as Flink, vector databases, and Kubernetes.

AI integrationBig DataCloud Native
0 likes · 7 min read
Why Your 2026 Big Data Resume Is Being Ignored and How to Fix It
Architect-Kip
Architect-Kip
Mar 2, 2026 · Big Data

How to Build a Scalable Tiered Archive & Query System for MySQL Data

This article presents a comprehensive design for a layered storage and unified scheduling platform that archives MySQL historical data, reduces storage costs, ensures high‑performance queries, and enables efficient data analysis through tiered hot, warm, and cold storage using big‑data technologies.

FlinkHiveSpark
0 likes · 13 min read
How to Build a Scalable Tiered Archive & Query System for MySQL Data
DataFunSummit
DataFunSummit
Mar 1, 2026 · Big Data

How Ant Group’s Flex Engine Supercharges Flink with Vectorization

This article details Ant Group’s Flex vectorized engine built on Velox, covering the current state of vectorization, Flex’s architecture (Flink + Velox), core feature development, correctness guarantees, large‑scale deployment results, and future directions for full‑link vectorization and broader hardware support.

Big DataFlexFlink
0 likes · 18 min read
How Ant Group’s Flex Engine Supercharges Flink with Vectorization
ITPUB
ITPUB
Feb 13, 2026 · Big Data

Real‑Time Sync of New MySQL Tables to Doris Using Flink CDC

This article explains how to extend a Flink CDC job that already syncs an entire MySQL database to Doris so that newly created tables are automatically created in Doris in real time, using the CdcTools utility, side‑output streams, and asynchronous I/O.

CDCCdcToolsFlink
0 likes · 9 min read
Real‑Time Sync of New MySQL Tables to Doris Using Flink CDC
DeWu Technology
DeWu Technology
Feb 9, 2026 · Big Data

How to Build a Production‑Ready Flink ClickHouse Sink with Dynamic Sharding, Batch‑by‑Size, and Robust Retry

This article presents a production‑grade Flink ClickHouse sink that solves common pain points such as lack of size‑based batching, static table schemas, and distributed‑table latency by introducing data‑size batching, dynamic table routing, local‑table writes, load‑balanced node discovery, back‑pressure queues, dual‑trigger flush, and recursive retry with node exclusion, all integrated with Flink checkpoint semantics for at‑least‑once guarantees.

BatchingCheckpointClickHouse
0 likes · 25 min read
How to Build a Production‑Ready Flink ClickHouse Sink with Dynamic Sharding, Batch‑by‑Size, and Robust Retry
ITPUB
ITPUB
Feb 9, 2026 · Databases

ClickHouse vs Doris vs Redis: Real‑World Query Performance Test with Flink

Using a 600k‑record IP range dataset, we built identical tables in ClickHouse and Doris, and a Redis skip‑list store, then ran three Flink‑Kafka streaming jobs to compare query latency across the three databases under varying traffic rates, revealing Redis as fastest, ClickHouse second, Doris slowest.

ClickHouseDatabase PerformanceFlink
0 likes · 8 min read
ClickHouse vs Doris vs Redis: Real‑World Query Performance Test with Flink
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 2, 2026 · Big Data

How We Built a Scalable Lakehouse Architecture with StarRocks, Paimon, and Flink

This article details the evolution of a data warehouse at RenliJia from a MaxCompute‑centric setup to a modern lakehouse using StarRocks, Paimon, Flink, and Fluss, describing design goals, technical evaluations, implementation steps for offline, OLAP, and real‑time workloads, and the challenges and future plans that emerged.

Big DataData WarehouseFlink
0 likes · 25 min read
How We Built a Scalable Lakehouse Architecture with StarRocks, Paimon, and Flink
ITPUB
ITPUB
Jan 22, 2026 · Backend Development

Sync New MySQL Tables to Doris in Real‑Time with Flink CDC and CdcTools

This article explains how to use Flink CDC together with the CdcTools utility to automatically capture newly created MySQL tables and synchronize both their schema and data to a Doris database in real time, covering the required code, side‑output handling, async execution, and a special delete‑sign field.

Async IOCDCFlink
0 likes · 10 min read
Sync New MySQL Tables to Doris in Real‑Time with Flink CDC and CdcTools
Java Baker
Java Baker
Dec 22, 2025 · Big Data

Mastering Offline and Real-Time Data Warehouses: A Backend Engineer’s Guide

Backend developers need to understand both offline and real-time data warehouses; this guide explains data collection, layering, partitioning, typical use cases, archiving strategies, and how to build a real-time warehouse with Flink, covering practical steps, examples, and key considerations for efficient data processing.

BackendData WarehouseFlink
0 likes · 8 min read
Mastering Offline and Real-Time Data Warehouses: A Backend Engineer’s Guide
dbaplus Community
dbaplus Community
Dec 8, 2025 · Databases

Which Database Wins IP Range Lookups? ClickHouse vs Doris vs Redis Benchmarks

This article presents a systematic benchmark comparing ClickHouse, Doris, and Redis for IP‑range dimension lookups using Flink‑Kafka pipelines, detailing test design, result table schema, query interfaces, and performance results across varying data rates, concluding that Redis offers the fastest and most stable query latency.

ClickHouseDatabase BenchmarkFlink
0 likes · 7 min read
Which Database Wins IP Range Lookups? ClickHouse vs Doris vs Redis Benchmarks
Ctrip Technology
Ctrip Technology
Nov 20, 2025 · Big Data

How Ctrip Achieved Minute‑Level Real‑Time Analytics with Flink CDC & Apache Paimon

Ctrip transformed its traditional T+1 offline warehouse into a near‑real‑time lakehouse by integrating Flink CDC with Apache Paimon, designing a two‑stage CDC ingestion, optimizing performance, implementing dynamic updates, and deploying the solution across multiple business scenarios, achieving minute‑level latency, reduced costs, and faster data‑driven decisions.

CDCFlinkPaimon
0 likes · 27 min read
How Ctrip Achieved Minute‑Level Real‑Time Analytics with Flink CDC & Apache Paimon
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 17, 2025 · Big Data

Flink 2025 Updates: Disaggregated State, AI Agents, and SQL Enhancements

The 2025 Flink release introduces a disaggregated state management architecture for cloud‑native elasticity, AI‑driven Flink Agents with LLM, Memory and Tool support, Delta Join and VARIANT type for semi‑structured data, adaptive batch execution, incremental checkpoints, high‑speed network optimizations, and new SQL and Process Table Functions, reshaping real‑time analytics.

Disaggregated StateFlinkReal-time analytics
0 likes · 8 min read
Flink 2025 Updates: Disaggregated State, AI Agents, and SQL Enhancements
vivo Internet Technology
vivo Internet Technology
Nov 12, 2025 · Big Data

How Vivo Solved Real‑Time Feature Concatenation with RocksDB and Flink

This article explains the evolution of Vivo's real‑time recommendation feature‑concatenation architecture, compares hour‑level, Redis‑streaming and RocksDB state‑backend solutions, and details the memory, performance, startup and HDFS RPC problems encountered along with the concrete fixes applied.

FlinkRocksDBfeature concatenation
0 likes · 21 min read
How Vivo Solved Real‑Time Feature Concatenation with RocksDB and Flink
Instant Consumer Technology Team
Instant Consumer Technology Team
Nov 10, 2025 · Big Data

Fixing Multi‑Version, Multi‑Cluster and HA with Apache Kyuubi for Spark/Flink

Apache Kyuubi, an enterprise‑grade multi‑tenant data gateway, replaces Livy and Flink SQL Gateway to support multiple engine versions, cross‑cluster elastic scheduling, high‑availability batch jobs, and traffic control, dramatically reducing deployment complexity, improving resource utilization, and accelerating release cycles for large‑scale Spark and Flink workloads.

Apache KyuubiBig DataData Gateway
0 likes · 18 min read
Fixing Multi‑Version, Multi‑Cluster and HA with Apache Kyuubi for Spark/Flink
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 24, 2025 · Big Data

How Leapmotor Scaled to 1M Cars with a Real‑Time Flink Data Platform

Leapmotor’s rapid growth to one million production cars drove a shift from daily batch data to minute‑level real‑time analytics, prompting the adoption of Flink as the core engine of a multi‑layered big‑data platform that handles massive IoT signals, supports fault diagnosis, and integrates batch and streaming workloads on the cloud.

Big DataData PlatformFlink
0 likes · 13 min read
How Leapmotor Scaled to 1M Cars with a Real‑Time Flink Data Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 22, 2025 · Big Data

Li Auto’s Trillion‑Row Real‑Time Car‑Network Analytics Using Hologres + Flink

Li Auto’s data team tackled the explosion of vehicle‑telemetry data—over a trillion rows and millions of signals per second—by redesigning their data foundation with Alibaba Cloud’s Hologres and Flink, achieving sub‑second latency, elastic scaling, high availability, and significant cost reductions across real‑time and offline workloads.

Car TelemetryData PlatformFlink
0 likes · 16 min read
Li Auto’s Trillion‑Row Real‑Time Car‑Network Analytics Using Hologres + Flink
StarRocks
StarRocks
Oct 14, 2025 · Big Data

How Ctrip Scaled UBT Analytics by Migrating from ClickHouse to StarRocks

Ctrip's User Behavior Tracking (UBT) system, handling 30 TB of daily data, moved from ClickHouse to StarRocks' compute‑storage separated architecture, cutting average query latency from 1.4 seconds to 203 ms, halving storage, reducing nodes from 50 to 40, and boosting write throughput to 3 million rows per second.

Big DataClickHouseData Migration
0 likes · 15 min read
How Ctrip Scaled UBT Analytics by Migrating from ClickHouse to StarRocks
DataFunSummit
DataFunSummit
Sep 21, 2025 · Big Data

Breaking the CPU Wall: BIGO’s Gluten Engine Accelerates Spark and Flink

When big‑data workloads hit the CPU wall, BIGO’s adoption of the open‑source Gluten project delivers native‑engine execution for Spark and a roadmap for Flink, achieving up to 30% end‑to‑end speedup, 50% memory savings, and a scalable, cost‑effective data processing platform.

Big DataFlinkGluten
0 likes · 16 min read
Breaking the CPU Wall: BIGO’s Gluten Engine Accelerates Spark and Flink
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Sep 11, 2025 · Big Data

How Paimon Transforms Membership Data Warehousing: From Legacy Lambda to Real‑Time Lakehouse

This article examines the challenges of a legacy Lambda‑based membership data warehouse, introduces Apache Paimon’s lakehouse architecture and its key features, and showcases three real‑world implementations—partial‑update order wide tables, Bitmap‑based UV counting, and branch‑based data correction—while discussing benefits, remaining challenges, and future directions.

Big DataData LakeData Warehouse
0 likes · 29 min read
How Paimon Transforms Membership Data Warehousing: From Legacy Lambda to Real‑Time Lakehouse
High Availability Architecture
High Availability Architecture
Sep 10, 2025 · Big Data

How Ctrip Business Travel Built a Near‑Real‑Time Lakehouse with Flink CDC & Paimon

This article details Ctrip Business Travel’s implementation of a near‑real‑time data warehouse using Flink CDC and the Paimon lakehouse engine, covering order wide‑table construction, ticket refund alerts, ad attribution, batch‑stream integration, and practical lessons on Partial Update, Aggregation, and Tag‑based incremental processing.

?=Batch-Stream IntegrationFlink
0 likes · 17 min read
How Ctrip Business Travel Built a Near‑Real‑Time Lakehouse with Flink CDC & Paimon
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 8, 2025 · Big Data

How Ele.me Revolutionized Real‑Time Data Warehousing with Flink‑Paimon Lakehouse

In this detailed case study, Alibaba’s Ele.me team explains how they evolved from siloed, chimney‑style real‑time warehouses to a unified Flink‑Paimon lakehouse, highlighting the three development stages, technology evaluations, the Alake platform’s one‑stop capabilities, production results, and future directions such as Fluss and AI integration.

AlakeFlinkLakehouse
0 likes · 17 min read
How Ele.me Revolutionized Real‑Time Data Warehousing with Flink‑Paimon Lakehouse
Ctrip Technology
Ctrip Technology
Sep 2, 2025 · Big Data

How Ctrip Built a Near‑Real‑Time Lakehouse with Flink & Paimon

This article details Ctrip Business Travel’s implementation of a near‑real‑time data warehouse and lakehouse using Flink CDC and Apache Paimon, covering order wide‑table construction, automated ticket reminders, ad attribution, batch‑stream integration, and lessons on Partial Update, Aggregation, and Tag‑based incremental processing.

Batch-Stream IntegrationFlinkLakehouse
0 likes · 17 min read
How Ctrip Built a Near‑Real‑Time Lakehouse with Flink & Paimon
Baidu Geek Talk
Baidu Geek Talk
Sep 1, 2025 · Big Data

How Baidu Netdisk Built a High‑Performance Real‑Time Engine with Flink

This article explains how Baidu Netdisk transitioned from Spark Streaming to a Flink‑based Tiangong real‑time computing engine, detailing the evolution, reasons for choosing Flink, architecture, configuration examples, business use cases, technical challenges, and future platform plans.

Baidu NetdiskBig DataFlink
0 likes · 16 min read
How Baidu Netdisk Built a High‑Performance Real‑Time Engine with Flink
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 31, 2025 · Big Data

Disaggregated Flink State AI Anomaly Detection, Slow‑Query Ranking (VLDB 2025)

At VLDB 2025, three Alibaba Cloud papers were accepted: one introduces a disaggregated state‑management architecture for Flink 2.0 that separates storage from compute, another presents a cross‑contrastive learning framework for unsupervised Flink anomaly detection, and the third proposes a multimodal ranking system for identifying root causes of slow queries in cloud databases.

Cross Contrastive LearningDisaggregated State ManagementFlink
0 likes · 10 min read
Disaggregated Flink State AI Anomaly Detection, Slow‑Query Ranking (VLDB 2025)
php Courses
php Courses
Aug 29, 2025 · Operations

How to Build a Real‑Time PHP Log Event Pipeline for Instant Insights

Learn how to transform PHP logs into real‑time, structured events by implementing a log event pipeline that includes JSON logging, lightweight collectors like Filebeat, streaming platforms such as Kafka or Flink, enrichment, and visualization with Grafana, enabling instant monitoring, alerting, and data‑driven decisions.

FlinkGrafanaKafka
0 likes · 7 min read
How to Build a Real‑Time PHP Log Event Pipeline for Instant Insights
Big Data Tech Team
Big Data Tech Team
Aug 25, 2025 · Interview Experience

Essential Big Data Interview Questions for Data Warehouse Engineer Roles

A comprehensive list of interview topics covering self‑introduction, career moves, data‑warehouse design, team building, architecture comparisons, fact‑table classification, common dimensions, performance tuning, and data‑governance for aspiring big‑data engineers.

Big DataData GovernanceFlink
0 likes · 4 min read
Essential Big Data Interview Questions for Data Warehouse Engineer Roles
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 21, 2025 · Big Data

How Hypergryph Built a High‑Performance Real‑Time Analytics Platform with StarRocks

This case study details how Hypergryph leveraged Alibaba Cloud EMR Serverless StarRocks, Flink, and Kafka to replace a ClickHouse data warehouse with a high‑performance, elastic, and easy‑to‑operate real‑time analytics platform that dramatically improved query speed, stability, operational efficiency, and cost for their gaming business.

FlinkKafkaStarRocks
0 likes · 8 min read
How Hypergryph Built a High‑Performance Real‑Time Analytics Platform with StarRocks
StarRocks
StarRocks
Aug 19, 2025 · Big Data

How Joydata Scaled to 150 Billion Daily Events with StarRocks: A Data Architecture Journey

Facing daily data growth from millions to 150 billion records, Joydata‑U transformed its analytics platform through three architectural stages—Hadoop, Hadoop + Trino, and finally StarRocks—introducing resource isolation, Flat JSON acceleration, and Bitmap indexing to cut query latency by up to seven times and achieve sub‑2‑minute data freshness across BI, ad‑tech, game analytics, and CRM workloads.

Bitmap IndexData ArchitectureFlat JSON
0 likes · 12 min read
How Joydata Scaled to 150 Billion Daily Events with StarRocks: A Data Architecture Journey
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 7, 2025 · Big Data

How Flink ML Transforms Intelligent Operations: Real‑Time Anomaly Detection, Forecasting & Log Clustering

This article explains how Alibaba Cloud’s big‑data platform leverages Flink ML to build an intelligent‑operations service that tackles stability, cost and efficiency challenges through time‑series anomaly detection, forecasting and streaming log‑clustering, dramatically reducing latency, complexity and operational overhead.

FlinkIntelligent OperationsLog Clustering
0 likes · 25 min read
How Flink ML Transforms Intelligent Operations: Real‑Time Anomaly Detection, Forecasting & Log Clustering
58 Tech
58 Tech
Aug 7, 2025 · Big Data

Transform Real‑Time Data Warehousing with Paimon: From Flink ROW_NUMBER to Streaming Lakehouse

This article details how a real‑time data warehouse built on Flink, Kafka, HBase and MySQL was redesigned using Paimon to eliminate costly deduplication, handle out‑of‑order events, enable streaming reads, simplify aggregation, replace multiple lookup sources, and achieve faster, more reliable batch repairs, resulting in major resource and operational gains.

Data WarehouseFlinkLakehouse
0 likes · 24 min read
Transform Real‑Time Data Warehousing with Paimon: From Flink ROW_NUMBER to Streaming Lakehouse
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 7, 2025 · Big Data

Building a Low‑Latency, High‑Capacity Real‑Time Data Platform for Finance

Facing growing data demands in finance, we replaced two legacy synchronization pipelines with a unified, low‑latency architecture using BabelX Real‑Time, Flink CDC, Iceberg v2 and Paimon, achieving minute‑level data freshness, ten‑to‑thirty‑fold query speedups, reduced storage costs, and streamlined schema management across multiple business units.

Big DataFlinkIceberg
0 likes · 12 min read
Building a Low‑Latency, High‑Capacity Real‑Time Data Platform for Finance
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 29, 2025 · Big Data

What Interviewers Really Ask About Flink, Data Consistency, and Warehouse Design

An interviewee recounts a challenging first interview that focused on Flink resource configuration, late data handling, and offline data warehouse design, and shares practical advice on attitude, thorough preparation, emphasizing real project storytelling, and post‑interview review to continuously improve performance.

Data ConsistencyData WarehouseFlink
0 likes · 4 min read
What Interviewers Really Ask About Flink, Data Consistency, and Warehouse Design
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 25, 2025 · Big Data

Cross-Contrastive Learning Cuts Flink Anomaly Detection Errors by 12%

The paper “Noise Matters: Cross Contrastive Learning for Flink Anomaly Detection”, accepted at VLDB 2025, introduces a novel cross‑contrastive method that leverages attention‑based representations and a boundary‑aware loss to detect Flink‑specific hotspot anomalies, achieving a 12.1% F1 improvement over state‑of‑the‑art techniques.

Big DataCross-Contrastive LearningFlink
0 likes · 6 min read
Cross-Contrastive Learning Cuts Flink Anomaly Detection Errors by 12%
Big Data Tech Team
Big Data Tech Team
Jul 23, 2025 · Big Data

From Beginner to Data Warehouse Architect: A Complete Roadmap

This guide walks you through every essential topic—from data warehouse architecture and layering, through ETL, OLAP, Hadoop, and Flink, to visualization tools, learning paths, recommended resources, and the management skills needed to become a proficient data warehouse architect.

Data WarehouseETLFlink
0 likes · 9 min read
From Beginner to Data Warehouse Architect: A Complete Roadmap
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 16, 2025 · Big Data

Master Flink Optimizations: TTL, Mini‑Batch, Two‑Phase Aggregation, Lookup Join & More

This article reviews the most effective Flink optimization techniques since 2022, including operator‑level TTL, mini‑batch processing, two‑phase aggregation, multi‑dimensional DISTINCT with FILTER, lookup join caching strategies, and TopN implementations, each rated with recommendation stars for production use.

Big DataFlinkLookup Join
0 likes · 8 min read
Master Flink Optimizations: TTL, Mini‑Batch, Two‑Phase Aggregation, Lookup Join & More
DataFunSummit
DataFunSummit
Jul 12, 2025 · Big Data

How Fluss Unifies Stream and Lake to Power AI Data Pipelines

In the era of rapid AI growth, Fluss offers a unified lake‑stream architecture that tackles data quality, timeliness, scale, and multimodal challenges by tightly integrating Flink streaming with a high‑performance data lake, enabling seamless real‑time and batch analytics for AI workloads.

AIData LakeFlink
0 likes · 12 min read
How Fluss Unifies Stream and Lake to Power AI Data Pipelines
StarRocks
StarRocks
Jul 9, 2025 · Big Data

How Shopee Built a Near‑Real‑Time Data Warehouse with Paimon and StarRocks

Shopee combined the Paimon data lake with StarRocks and Flink to create a quasi‑real‑time warehouse, enabling fast task diagnostics and a high‑performance financial reconciliation system while dramatically reducing storage costs and latency through innovative ODS, snapshot, and branch table techniques.

FlinkPaimonStarRocks
0 likes · 13 min read
How Shopee Built a Near‑Real‑Time Data Warehouse with Paimon and StarRocks
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 8, 2025 · Big Data

Flink’s AI Agents and Disaggregated State: Transforming Big Data

The article reviews key topics from the FFA2025 Singapore conference, highlighting Flink’s new AI‑focused Agents framework, the breakthrough Flink 2.0 disaggregated state architecture, emerging lake storage solutions like Paimon, and the Fluss streaming table store, illustrating how big‑data platforms are evolving for AI workloads.

AI agentsBig DataData Lake
0 likes · 6 min read
Flink’s AI Agents and Disaggregated State: Transforming Big Data
StarRocks
StarRocks
Jul 1, 2025 · Big Data

How StarRocks Boosted Suixingfu’s Real‑Time Data Platform: 3× Faster Queries & 10× Faster Analytics

Suixingfu rebuilt its payment data pipeline by replacing a fragmented Lambda stack with a unified Porter CDC + StarRocks + Elasticsearch architecture, achieving three‑fold query speed, ten‑fold analytics efficiency, 20% storage reduction, and sub‑second data‑capture latency across high‑concurrency, ad‑hoc, and batch workloads.

CDCData WarehouseFlink
0 likes · 14 min read
How StarRocks Boosted Suixingfu’s Real‑Time Data Platform: 3× Faster Queries & 10× Faster Analytics
JD Retail Technology
JD Retail Technology
Jun 10, 2025 · Artificial Intelligence

How JD Builds a Scalable AI‑Powered Recommendation Data System with Flink

This article explains JD's complex recommendation system data pipeline—from indexing, sampling, and feature engineering to explainability and real‑time metrics—highlighting challenges such as data consistency, latency, and the use of Flink for massive, low‑latency processing.

Flinkexplainabilityfeature engineering
0 likes · 23 min read
How JD Builds a Scalable AI‑Powered Recommendation Data System with Flink
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
May 21, 2025 · Big Data

How Alibaba’s A+ Traffic Analysis Achieved Sub‑Second Log Queries with StarRocks & Paimon

This article details how Alibaba's A+ traffic analysis platform tackled trillion‑row log ingestion and high‑concurrency queries by redesigning storage with Paimon, leveraging Flink for real‑time ingestion, and using StarRocks for fast lake analytics, ultimately reducing query latency from minutes to seconds.

FlinkLog AnalyticsPaimon
0 likes · 15 min read
How Alibaba’s A+ Traffic Analysis Achieved Sub‑Second Log Queries with StarRocks & Paimon
Big Data Technology & Architecture
Big Data Technology & Architecture
May 21, 2025 · Big Data

Interview Experience: Flink Task Resource Allocation, Issues, and Monitoring

This article shares an interviewee's experience discussing core Flink interview questions, including typical resource allocation for large online tasks, common problems such as data, performance, stability, and resource issues, and the monitoring practices for clusters and tasks, while also containing a brief self‑promotion.

Big DataFlinkPerformance Issues
0 likes · 7 min read
Interview Experience: Flink Task Resource Allocation, Issues, and Monitoring
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
May 19, 2025 · Industry Insights

How Xiaohongshu Built a Minute‑Level Near‑Real‑Time Data Warehouse with Incremental Computing

Facing billions of daily logs and the need for minute‑level experiment metrics, Xiaohongshu partnered with Yunqi Tech to design a generic incremental‑compute solution that delivers near‑real‑time data warehousing with lower cost, higher accuracy, simplified pipelines, and improved query performance.

Big DataData LakeFlink
0 likes · 24 min read
How Xiaohongshu Built a Minute‑Level Near‑Real‑Time Data Warehouse with Incremental Computing
Selected Java Interview Questions
Selected Java Interview Questions
May 15, 2025 · Backend Development

Six Common Approaches to Synchronize MySQL Data to Elasticsearch

This article reviews six mainstream solutions for keeping MySQL and Elasticsearch in sync—including synchronous double‑write, asynchronous MQ‑based double‑write, Logstash polling, Canal binlog listening, DataX batch migration, and Flink stream processing—detailing their scenarios, advantages, drawbacks, and practical code examples to guide optimal technical selection.

CanalElasticsearchFlink
0 likes · 8 min read
Six Common Approaches to Synchronize MySQL Data to Elasticsearch
Huolala Tech
Huolala Tech
May 14, 2025 · Big Data

How Lalamove Scaled Real‑Time Data Warehousing with Flink and Paimon

Lalamove’s international logistics platform transformed its real‑time data warehouse by leveraging Apache Flink and the Paimon lakehouse, addressing challenges of multi‑region data centers, time‑zone diversity, frequent upstream changes, and high costs, while improving scalability, latency, and operational efficiency across global markets.

Big DataFlinkPaimon
0 likes · 13 min read
How Lalamove Scaled Real‑Time Data Warehousing with Flink and Paimon
Su San Talks Tech
Su San Talks Tech
May 5, 2025 · Big Data

6 Proven Ways to Sync MySQL Data to Elasticsearch – Choose the Right Strategy

This article compares six mainstream MySQL‑to‑Elasticsearch synchronization methods—synchronous double‑write, asynchronous MQ, Logstash polling, Canal binlog listening, DataX batch sync, and Flink streaming—detailing scenarios, code samples, advantages, drawbacks, and practical selection guidance for developers.

CanalElasticsearchFlink
0 likes · 9 min read
6 Proven Ways to Sync MySQL Data to Elasticsearch – Choose the Right Strategy
Bilibili Tech
Bilibili Tech
Apr 8, 2025 · Big Data

Building a Real-Time Data Warehouse for B站 Game Business

To meet Bilibili’s rapidly expanding game business, the team built a unified real-time data warehouse using Hologres and Flink that replaces the traditional Lambda stack, delivering high-throughput writes, low-latency processing, seamless offline-online integration, global deployment, and real-time support for operations, advertising, and risk analytics.

Big Data ArchitectureData architecture case studyFlink
0 likes · 17 min read
Building a Real-Time Data Warehouse for B站 Game Business
DataFunSummit
DataFunSummit
Apr 3, 2025 · Big Data

Apache Hudi Asia Technical Salon Highlights: Practices and Innovations from Kuaishou, Meituan, Douyin, Huawei, and JD

The Apache Hudi Asia technical salon held in Beijing on March 29 gathered over 230 on‑site participants and 16,000 online viewers, featuring expert talks from leading Chinese tech companies that showcased real‑world Hudi implementations, performance optimizations, and future roadmap for data‑lake technologies.

Apache HudiBig DataData Lake
0 likes · 13 min read
Apache Hudi Asia Technical Salon Highlights: Practices and Innovations from Kuaishou, Meituan, Douyin, Huawei, and JD
DataFunSummit
DataFunSummit
Apr 1, 2025 · Big Data

Understanding Flink CDC 3.3: Features, Improvements, and Future Plans

This article provides a comprehensive overview of Flink CDC 3.3, detailing its CDC fundamentals, new connectors, Transform module enhancements, asynchronous snapshot splitting, community adoption, and upcoming roadmap for broader ecosystem support and batch‑mode execution.

Big DataCDCChange Data Capture
0 likes · 15 min read
Understanding Flink CDC 3.3: Features, Improvements, and Future Plans
iQIYI Technical Product Team
iQIYI Technical Product Team
Mar 27, 2025 · Big Data

Cost‑Effective Real‑Time Data Warehouse 2.0: Migrating from Kafka to Iceberg

iQIYI transformed its real‑time data warehouse by replacing a costly Kafka‑based Lambda stack with a unified stream‑batch Iceberg lake, cutting storage expenses by 90%, halving compute costs, extending data retention, and delivering minute‑level freshness for 90% of use cases while preserving second‑level processing where needed.

Cost OptimizationFlinkIceberg
0 likes · 11 min read
Cost‑Effective Real‑Time Data Warehouse 2.0: Migrating from Kafka to Iceberg
Big Data Tech Team
Big Data Tech Team
Mar 25, 2025 · Big Data

How Apache Paimon Transforms Real‑Time Lakehouse Architecture

This article analyzes the limitations of a traditional Flink + Talos + Iceberg real‑time lakehouse, introduces Apache Paimon's lakehouse table format and LSM storage, and demonstrates three practical use cases—partial‑update widening, streaming upsert, and lookup join—showing cost, stability, and performance improvements while outlining future roadmap.

Apache PaimonFlinkLakehouse
0 likes · 16 min read
How Apache Paimon Transforms Real‑Time Lakehouse Architecture
AntData
AntData
Mar 20, 2025 · Big Data

Design and Optimization of Real‑time Data Lake Tables with Paimon and Flink for Advertising Diagnostics

This article presents a comprehensive exploration of using Apache Paimon and Flink to design lake tables that support minute‑level latency, low cost, and unified batch‑stream processing for advertising data, covering schema design, partitioning strategies, performance trade‑offs, cost analysis, and operational best practices.

Big DataData LakeFlink
0 likes · 34 min read
Design and Optimization of Real‑time Data Lake Tables with Paimon and Flink for Advertising Diagnostics
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 17, 2025 · Big Data

Lakehouse Implementations at Leading Companies: Challenges, Solutions, and Benefits

This article reviews how major tech firms such as Alibaba, Tencent, ByteDance, and Kuaishou tackled lakehouse challenges—including architecture fragmentation, cost, scalability, and complex multimodal data—by adopting real‑time lakehouse solutions like Flink + Paimon, Iceberg + StarRocks, Hudi + LAS, and Doris + Alluxio, and outlines the resulting performance and cost gains.

FlinkLakehousePaimon
0 likes · 9 min read
Lakehouse Implementations at Leading Companies: Challenges, Solutions, and Benefits
Alimama Tech
Alimama Tech
Mar 12, 2025 · Big Data

Design and Evolution of Alibaba Advertising Real-Time Data Warehouse

Alibaba Mama’s advertising platform migrated from a monolithic Flink‑Kafka pipeline to a layered Paimon lakehouse, adding DWS upsert support and multi‑layer storage, which delivers minute‑level data freshness, cuts latency by 2.5 hours, reduces resource use over 40 %, halves development effort and achieves ≥99.9 % availability.

AdvertisingAlibabaData Lake
0 likes · 18 min read
Design and Evolution of Alibaba Advertising Real-Time Data Warehouse
Baidu Tech Salon
Baidu Tech Salon
Mar 6, 2025 · Big Data

Real-Time Anti-Fraud Streaming System Based on Flink: Architecture, Challenges, and Optimizations

The article describes a Flink‑based real‑time anti‑fraud streaming system that combines a risk‑control platform, configurable YAML‑driven pipelines, and optimized state handling—using early event‑time triggers, micro‑batch caching, and coarse‑grained key reduction—to compute multi‑dimensional features, support rapid strategy updates, simulation filtering, and seamless output to ClickHouse, Hive, and Redis for both instant monitoring and offline analysis.

ConfigurationFlinkReal-time Streaming
0 likes · 26 min read
Real-Time Anti-Fraud Streaming System Based on Flink: Architecture, Challenges, and Optimizations
Baidu Geek Talk
Baidu Geek Talk
Mar 3, 2025 · Big Data

Real-Time Anti-Cheat Streaming System Based on Flink: Architecture, Challenges, and Solutions

The article details a Flink‑based real‑time anti‑cheat streaming architecture that combines tumbling, sliding and session windows with early triggers, batch state updates cached in memory, coarse‑grained key reduction, and YAML‑driven strategy configuration to deliver millisecond‑level detection, seamless integration with ClickHouse, Hive, Redis and message queues, and self‑service analytics, achieving high throughput, low latency, and robust stability for large‑scale risk control.

Configuration ManagementFlinkPerformance Optimization
0 likes · 25 min read
Real-Time Anti-Cheat Streaming System Based on Flink: Architecture, Challenges, and Solutions
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 3, 2025 · Big Data

The Turning Point for Data Development: From Traditional Data Engineering to AI Data Engineering

The article analyzes how the rapid rise of open‑source large‑model AI in 2025 is reshaping the data development profession, urging developers to transition from specialized data‑engineer roles to full‑stack AI data engineering skills such as distributed computing, lake‑house architectures, and model tuning.

AIBig DataFlink
0 likes · 7 min read
The Turning Point for Data Development: From Traditional Data Engineering to AI Data Engineering
DataFunSummit
DataFunSummit
Mar 2, 2025 · Artificial Intelligence

Lightweight Algorithm Service Architecture Based on Offline Tag Knowledge Base and Real‑time Data Warehouse

This article presents a lightweight algorithm service solution that combines an offline pre‑computed tag knowledge base with a real‑time data warehouse using Flink, Doris, Hive SQL and Python to achieve short development cycles, agile iteration, low cost, and scalable deployment for classification and clustering tasks.

Flinkalgorithm servicedoris
0 likes · 16 min read
Lightweight Algorithm Service Architecture Based on Offline Tag Knowledge Base and Real‑time Data Warehouse
Big Data Technology Architecture
Big Data Technology Architecture
Mar 1, 2025 · Big Data

Core Principles and Practical Guide to Flink CDC

This article explains CDC fundamentals, details Flink CDC's architecture and advantages, provides setup steps, code examples for SQL and DataStream APIs, discusses performance tuning, consistency, common issues, and typical real‑time data integration scenarios.

CDCChange Data CaptureDebezium
0 likes · 7 min read
Core Principles and Practical Guide to Flink CDC
StarRocks
StarRocks
Feb 27, 2025 · Big Data

How iQIYI Boosted Ad Query Performance 400% with StarRocks – A Deep Dive into OLAP Evolution

This article details iQIYI's transition from Impala+Kudu and ClickHouse to StarRocks, describing the OLAP architecture, performance gains of up to 400% in advertising workloads, the technical challenges of data consistency, lake‑warehouse fusion, operational scaling, and the step‑by‑step migration process using a dual‑run platform.

ClickHouseFlinkOLAP
0 likes · 15 min read
How iQIYI Boosted Ad Query Performance 400% with StarRocks – A Deep Dive into OLAP Evolution
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 13, 2025 · Big Data

From Lambda to Lakehouse: Evolution of Real‑Time Data Warehouses with Hologres & Flink

This article traces the three‑generation evolution of real‑time data warehouses—from the Lambda architecture to a lakehouse approach—detailing how Hologres, Flink, and Dynamic Table technologies enable unified storage, multi‑mode computing, serverless execution, and high‑performance analytics in modern big‑data environments.

Dynamic TableFlinkHologres
0 likes · 15 min read
From Lambda to Lakehouse: Evolution of Real‑Time Data Warehouses with Hologres & Flink
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 27, 2025 · Big Data

Unlock Real-Time Data Sync with Flink CDC: YAML Integration, Transform & Route Explained

This article summarizes an advanced Flink CDC presentation, covering Flink CDC fundamentals, real‑time Flink integration, CDC‑YAML core capabilities, supported sync links, Transform and Route modules, monitoring metrics, schema‑change strategies, typical use cases, performance optimizations, demo implementations, and future development plans.

CDCData IntegrationFlink
0 likes · 20 min read
Unlock Real-Time Data Sync with Flink CDC: YAML Integration, Transform & Route Explained
DataFunSummit
DataFunSummit
Jan 14, 2025 · Big Data

Tencent Real-Time Lakehouse Intelligent Optimization Practice

This presentation details Tencent's real‑time lakehouse architecture and the four key topics—lakehouse design, intelligent optimization services, scenario‑driven capabilities, and future outlook—covering components such as Spark, Flink, Iceberg, Auto‑Optimize Service, indexing, clustering, AutoEngine, and PyIceberg implementations.

Auto OptimizeBig DataFlink
0 likes · 12 min read
Tencent Real-Time Lakehouse Intelligent Optimization Practice
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 14, 2025 · Big Data

How Fluss Unifies Lake and Stream for Real‑Time Analytics: Architecture, Benefits, and Future Roadmap

This article summarizes a talk by Alibaba Cloud senior engineer and Flink Committer Luo Yuxia on the challenges of separating lake and stream storage, introduces the Fluss lake‑stream unified architecture, explains its technical benefits such as second‑level data freshness, unified metadata, efficient changelog generation, and outlines future plans for broader ecosystem integration.

Data LakeFlinkFluss
0 likes · 13 min read
How Fluss Unifies Lake and Stream for Real‑Time Analytics: Architecture, Benefits, and Future Roadmap
ITPUB
ITPUB
Jan 7, 2025 · Databases

Cut Costs 25% and Boost Performance 70%: Retail Giant’s OceanBase Migration

The article details how WanJia Shuke, the tech arm of China Resources Vanguard, tackled retail system fragmentation, user‑experience degradation, complex linkages and scalability limits by migrating dozens of projects to the distributed OceanBase database, achieving up to 70% performance improvement, 25% cost reduction and streamlined operations.

FlinkOceanBaseRetail
0 likes · 15 min read
Cut Costs 25% and Boost Performance 70%: Retail Giant’s OceanBase Migration
DataFunSummit
DataFunSummit
Jan 3, 2025 · Big Data

Tencent Real‑Time Lakehouse Intelligent Optimization Practices

This article presents Tencent's end‑to‑end real‑time lakehouse architecture, detailing its three‑layer design, the Auto Optimize Service modules such as compaction, indexing, clustering and engine acceleration, as well as scenario‑driven capabilities like multi‑stream joins, primary‑key tables, in‑place migration and PyIceberg support, and concludes with future optimization directions.

Big DataFlinkIceberg
0 likes · 11 min read
Tencent Real‑Time Lakehouse Intelligent Optimization Practices
Bilibili Tech
Bilibili Tech
Jan 3, 2025 · Big Data

Evolution and Production Practices of Apache Celeborn Remote Shuffle Service at Bilibili

Bilibili replaced Spark’s unstable External Shuffle Service with a push‑based approach, then deployed Apache Celeborn’s remote shuffle on Kubernetes using HA masters, tiered workers, extensive monitoring, history‑based routing, chaos testing, and seamless Spark, Flink, and MapReduce integration, while planning self‑healing, elastic scaling, and priority‑aware I/O enhancements.

Apache CelebornBig DataFlink
0 likes · 28 min read
Evolution and Production Practices of Apache Celeborn Remote Shuffle Service at Bilibili
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 2, 2025 · Big Data

Apache Paimon: Core Capabilities, Table Types, LSM Tree, Buckets, Merge Engines, and Operational Details

This article provides a comprehensive overview of Apache Paimon, covering its real‑time lake ingestion, unified stream‑batch processing, table types (primary‑key and append‑only), LSM‑tree storage, bucket mechanisms, merge‑engine options, compaction strategies, concurrency control, consumption methods, tag management, data cleanup, and system tables for big‑data workloads.

Apache PaimonBig DataFlink
0 likes · 25 min read
Apache Paimon: Core Capabilities, Table Types, LSM Tree, Buckets, Merge Engines, and Operational Details
DataFunSummit
DataFunSummit
Dec 27, 2024 · Big Data

Tencent Real-time Lakehouse Intelligent Optimization Practice

This presentation describes Tencent's real-time lakehouse architecture, including data lake compute, management, and storage layers, and details the intelligent optimization services—such as compaction, indexing, clustering, and auto-engine—designed to improve query performance, storage cost, and operational efficiency for large-scale data processing.

AutoEngineFlinkIceberg
0 likes · 11 min read
Tencent Real-time Lakehouse Intelligent Optimization Practice
Bilibili Tech
Bilibili Tech
Dec 27, 2024 · Big Data

Consistency Architecture for Bilibili Recommendation Model Data Flow

The article outlines Bilibili’s revamped recommendation data‑flow architecture that eliminates timing and calculation inconsistencies by snapshotting online features, unifying feature computation in a single C++ library accessed via JNI, and orchestrating label‑join and sample extraction through near‑line Kafka/Flink pipelines, with further performance gains and Iceberg‑based future extensions.

Data ConsistencyFlinkIceberg
0 likes · 12 min read
Consistency Architecture for Bilibili Recommendation Model Data Flow
DaTaobao Tech
DaTaobao Tech
Dec 18, 2024 · Big Data

Incremental Computation in Big Data: Flink Materialized Table and Paimon

The article explains how Flink 1.20’s Materialized Table combined with Paimon’s changelog storage enables incremental computation that unifies batch and streaming workloads, delivering minute‑level latency at lower cost, illustrated by a materialized‑table example while noting current streaming‑only support and future batch extensions.

Big DataFlinkIncremental Computation
0 likes · 13 min read
Incremental Computation in Big Data: Flink Materialized Table and Paimon
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 18, 2024 · Big Data

Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse

The article reviews the major directions of Flink 2.0—including compute‑storage separation, a new Materialized Table for unified batch‑stream processing, and deeper integration with Paimon for streaming warehouses—while offering a cautious perspective on their practical impact and migration challenges.

Batch-Stream IntegrationBig DataCompute-Storage Separation
0 likes · 5 min read
Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse