Tagged articles

Flink

957 articles · Page 1 of 10

Jul 2, 2026 · Fundamentals

Production Hit by Silent Data Corruption: JDK 25 G1GC Bug Explained

A rare silent data‑corruption bug in JDK 25’s G1GC caused Parquet and ORC files written by Spark and Flink to become unreadable, prompting a multi‑stage investigation that traced the issue to an optional evacuation flaw affecting JNI‑pinned objects, which was later back‑ported and fixed in the OpenJDK community.

AI debuggingFlinkG1GC

0 likes · 20 min read

Production Hit by Silent Data Corruption: JDK 25 G1GC Bug Explained

DataFunSummit

Jul 1, 2026 · Artificial Intelligence

How Bailei Knowledge Base Uses Flink and DLF (Paimon) to Build an Enterprise‑Scale Full‑Modal RAG System

Bailei Knowledge Base delivers an enterprise‑grade, full‑modal Retrieval‑Augmented Generation solution covering documents, tables, images and audio‑video, powered by Flink's high‑throughput streaming for billions of daily document indexes and DLF/Paimon’s three‑layer reliable backup, achieving sub‑200 ms latency and 99.99% availability.

DLFEnterprise AIFlink

0 likes · 26 min read

How Bailei Knowledge Base Uses Flink and DLF (Paimon) to Build an Enterprise‑Scale Full‑Modal RAG System

DataFunTalk

Jun 30, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

Xiaohongshu, with over 3.5 billion monthly users and daily logs in the trillions, migrated 500 PB of data to Alibaba Cloud and iterated its data platform through four architecture generations—ClickHouse‑based ad‑hoc, Lambda, Lakehouse, and a unified incremental compute model—cutting resource, development, and storage costs to one‑third while delivering sub‑10‑second query latency at petabyte scale.

Big DataClickHouseData Architecture

0 likes · 22 min read

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

DataFunTalk

Jun 24, 2026 · Big Data

How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era

Xiaohongshu, with over 350 million monthly users and daily logs in the billions, migrated its data platform from AWS to Alibaba Cloud and iterated four times—from a ClickHouse‑based ad‑hoc layer to a Lambda architecture and finally a Lakehouse with incremental compute—cutting architecture complexity, resource cost and development effort each to about one‑third while delivering second‑level analytics on trillion‑scale data.

Big DataClickHouseData Architecture

0 likes · 22 min read

How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era

DataFunTalk

Jun 20, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

The article details Xiaohongshu's step‑by‑step migration from a simple ClickHouse‑based analytics stack to a Lambda‑style 2.0 architecture and finally to a Lakehouse‑based 3.0 design, highlighting concrete performance numbers, cost reductions, and the definition of a generic incremental‑compute model (SPOT) that underpins the evolution.

Big DataClickHouseData Architecture

0 likes · 22 min read

DataFunTalk

Jun 11, 2026 · Artificial Intelligence

How Qichacha Leverages Large Language Models for Field‑Level Data Lineage

This article details Qichacha's use of large language models to extract field‑level data lineage from heterogeneous, non‑standard code and ETL assets, describing the motivation, architectural blueprint, practical challenges such as cost, accuracy and hallucination, and the resulting improvements in impact analysis, metric tracing, and sensitive‑data governance.

Big DataData GovernanceFlink

0 likes · 11 min read

How Qichacha Leverages Large Language Models for Field‑Level Data Lineage

DataFunSummit

Jun 7, 2026 · Artificial Intelligence

How Qichacha Uses Large Language Models for Field‑Level Data Lineage

This article details Qichacha's technical journey of applying large language models to resolve field‑level data lineage challenges in a complex, multi‑source data environment, describing the motivation, architecture, practical implementation, engineering trade‑offs, and measurable outcomes.

AIBig DataData Governance

0 likes · 11 min read

How Qichacha Uses Large Language Models for Field‑Level Data Lineage

DataFunTalk

May 28, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based ad‑hoc analysis to a Lambda‑style architecture and finally to a lakehouse with generic incremental compute, cutting architecture complexity, resource and development costs by one‑third while delivering second‑level queries over trillions of rows.

Big DataClickHouseData Architecture

0 likes · 21 min read

Big Data Technology & Architecture

May 26, 2026 · Big Data

Advanced Paimon Production Issues: 10 Rare Compaction‑Related Problems and Fixes

This article enumerates ten uncommon, compaction‑related problems encountered in large‑scale Paimon deployments, explains their root causes—such as RPC timeouts, snapshot expiration, file corruption, and write conflicts—and provides concrete configuration tweaks and operational steps to resolve each issue.

Big DataCompactionFlink

0 likes · 9 min read

Advanced Paimon Production Issues: 10 Rare Compaction‑Related Problems and Fixes

DataFunTalk

May 22, 2026 · Big Data

How Xiaohongshu Cut Data Architecture Complexity and Cost by One‑Third in the Big AI Data Era

The article details Xiaohongshu's evolution from a simple ClickHouse‑based analytics layer to a Lambda‑enabled 2.0 stack and finally a Lakehouse‑based 3.0 architecture, showing how each iteration reduced infrastructure complexity, resource consumption and development effort by roughly one‑third while supporting trillions of daily events and AI‑driven use cases.

Big DataClickHouseData Architecture

0 likes · 21 min read

How Xiaohongshu Cut Data Architecture Complexity and Cost by One‑Third in the Big AI Data Era

DataFunTalk

May 11, 2026 · Big Data

How Xiaohongshu Re‑engineered Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based ad‑hoc analysis to a Lambda‑style architecture and finally to a lakehouse built on Iceberg, StarRocks, Flink and Spark, cutting architecture complexity, resource and development costs by two‑thirds while supporting trillions of daily events with sub‑second query latency.

Big DataClickHouseFlink

0 likes · 22 min read

DataFunTalk

May 6, 2026 · Big Data

How Xiaohongshu Evolved Its Data Architecture for the Big AI Data Era

The article details Xiaohongshu's four‑stage data‑platform evolution—from a simple ClickHouse ad‑hoc setup to a Lambda‑based 2.0 design and finally a lakehouse‑driven 3.0 architecture—highlighting the adoption of general incremental compute, cost‑reduction to one‑third, performance gains of up to ten‑fold, and the SPOT standards that guide the new system.

Big DataClickHouseData Architecture

0 likes · 21 min read

DataFunTalk

Apr 29, 2026 · Big Data

How Xiaohongshu Revamped Its Data Architecture for the Big AI Data Era

Xiaohongshu transformed its data platform from a simple ClickHouse‑based analytics stack to a unified lakehouse with generic incremental compute, cutting architecture complexity, resource cost, and development effort by roughly one‑third while supporting petabyte‑scale, sub‑second queries across its 350 million‑user app.

Big DataClickHouseData Architecture

0 likes · 22 min read

How Xiaohongshu Revamped Its Data Architecture for the Big AI Data Era

Lao Guo's Learning Space

Apr 29, 2026 · Big Data

Designing a Full-Stack Credit Data System: From Ingestion to Real-Time Decision

The article dissects a credit data system architecture, detailing six logical layers—from multi-source data collection and feature engineering (including graph features and feature stores) to model training, real‑time stream processing, decision engine integration, and privacy‑preserving computation—while explaining the trade‑offs, tools, and performance targets needed for accurate, low‑latency risk assessment.

Credit ScoringFeature StoreFlink

0 likes · 16 min read

Designing a Full-Stack Credit Data System: From Ingestion to Real-Time Decision

Alibaba Cloud Big Data AI Platform

Apr 27, 2026 · Information Security

Real-Time Agentic Risk Detection with Flink, Fluss, and Large Language Models

The article presents a Flink‑Fluss‑LLM architecture that captures full‑link agent events via a non‑intrusive hook, combines semantic AI inference with deterministic CEP rules, and delivers millisecond‑level alerts for malicious user detection, tool result poisoning, and chain‑attack risk mitigation.

AI FunctionAgent securityFlink

0 likes · 41 min read

Real-Time Agentic Risk Detection with Flink, Fluss, and Large Language Models

Big Data Technology & Architecture

Apr 24, 2026 · Artificial Intelligence

A Deep Dive into Flink Agents: Architecture, Roadmap, and Upcoming Features

The article explains Flink Agents' current 0.3 preview, detailing its layered architecture—from Agent definition to execution plan and runtime operators—while outlining the roadmap for Skills integration, Mem0 long‑term memory, durable execution, and observability enhancements aimed at production readiness.

AI AgentsAgentPlanFlink

0 likes · 7 min read

A Deep Dive into Flink Agents: Architecture, Roadmap, and Upcoming Features

Lobster Programming

Apr 8, 2026 · Big Data

How to Implement Real‑Time API Traffic Counting at Scale

This article compares three practical approaches—direct database storage, a Flink‑Kafka‑Redis‑Grafana pipeline, and an ELK stack—to achieve real‑time API request counting for high‑concurrency scenarios, outlining their architectures, advantages, and trade‑offs.

API analyticsELKFlink

0 likes · 6 min read

How to Implement Real‑Time API Traffic Counting at Scale

Alibaba Cloud Observability

Apr 6, 2026 · Cloud Native

How Alibaba Cloud Built Real‑Time OpenAPI Monitoring with Flink + SLS

This article details the design and implementation of a cloud‑native, real‑time monitoring system for Alibaba Cloud OpenAPI, covering background challenges, a Flink‑SLS architecture, multi‑region data processing, checkpoint and state‑backend tuning, source‑side predicate pushdown, visualization with Grafana, and production results.

Big DataCloud NativeFlink

0 likes · 21 min read

How Alibaba Cloud Built Real‑Time OpenAPI Monitoring with Flink + SLS

Ctrip Technology

Apr 2, 2026 · Big Data

Why Upgrading to JDK 25 Broke Spark & Flink Data – Inside the G1GC Bug and Its Fix

During a gray‑release of JDK 25 on Ctrip's massive Spark and Flink clusters, silent data corruption appeared in Parquet and ORC files, traced to a G1GC Optional Evacuation bug that moved JNI‑pinned objects, a root cause later back‑ported and fixed in JDK 25.0.3.

FlinkG1GCJDK

0 likes · 21 min read

Why Upgrading to JDK 25 Broke Spark & Flink Data – Inside the G1GC Bug and Its Fix

Big Data Tech Team

Apr 1, 2026 · Big Data

Why Your 2026 Big Data Resume Is Being Ignored and How to Fix It

In the 2026 spring hiring season, many big‑data job seekers see their resumes disappear because they still focus on offline batch processing, while employers now demand real‑time streaming, AI‑driven data pipelines, and cloud‑native deployment skills such as Flink, vector databases, and Kubernetes.

AI integrationBig DataCloud Native

0 likes · 7 min read

Why Your 2026 Big Data Resume Is Being Ignored and How to Fix It

Architect-Kip

Mar 2, 2026 · Big Data

How to Build a Scalable Tiered Archive & Query System for MySQL Data

This article presents a comprehensive design for a layered storage and unified scheduling platform that archives MySQL historical data, reduces storage costs, ensures high‑performance queries, and enables efficient data analysis through tiered hot, warm, and cold storage using big‑data technologies.

Data ArchivingDorisFlink

0 likes · 13 min read

How to Build a Scalable Tiered Archive & Query System for MySQL Data

DataFunSummit

Mar 1, 2026 · Big Data

How Ant Group’s Flex Engine Supercharges Flink with Vectorization

This article details Ant Group’s Flex vectorized engine built on Velox, covering the current state of vectorization, Flex’s architecture (Flink + Velox), core feature development, correctness guarantees, large‑scale deployment results, and future directions for full‑link vectorization and broader hardware support.

Big DataFlexFlink

0 likes · 18 min read

How Ant Group’s Flex Engine Supercharges Flink with Vectorization

Amazon Cloud Developers

Feb 24, 2026 · Cloud Computing

Achieving an 8× Speedup: Building a Flink Monitoring System with Kiro AI IDE

This article walks through using Kiro AI IDE to develop an Amazon EMR Flink monitoring system, detailing spec‑driven development, MCP integration, steering rules, the full backend and frontend stack, and shows how the workflow cuts implementation time from 60–80 hours to about 10 hours, delivering a 6–8× efficiency gain.

AIAmazon EMRFlink

0 likes · 10 min read

Achieving an 8× Speedup: Building a Flink Monitoring System with Kiro AI IDE

ITPUB

Feb 13, 2026 · Big Data

Real‑Time Sync of New MySQL Tables to Doris Using Flink CDC

This article explains how to extend a Flink CDC job that already syncs an entire MySQL database to Doris so that newly created tables are automatically created in Doris in real time, using the CdcTools utility, side‑output streams, and asynchronous I/O.

CDCCdcToolsDoris

0 likes · 9 min read

Real‑Time Sync of New MySQL Tables to Doris Using Flink CDC

DeWu Technology

Feb 9, 2026 · Big Data

How to Build a Production‑Ready Flink ClickHouse Sink with Dynamic Sharding, Batch‑by‑Size, and Robust Retry

This article presents a production‑grade Flink ClickHouse sink that solves common pain points such as lack of size‑based batching, static table schemas, and distributed‑table latency by introducing data‑size batching, dynamic table routing, local‑table writes, load‑balanced node discovery, back‑pressure queues, dual‑trigger flush, and recursive retry with node exclusion, all integrated with Flink checkpoint semantics for at‑least‑once guarantees.

BatchingCheckpointClickHouse

0 likes · 25 min read

How to Build a Production‑Ready Flink ClickHouse Sink with Dynamic Sharding, Batch‑by‑Size, and Robust Retry

ITPUB

Feb 9, 2026 · Databases

ClickHouse vs Doris vs Redis: Real‑World Query Performance Test with Flink

Using a 600k‑record IP range dataset, we built identical tables in ClickHouse and Doris, and a Redis skip‑list store, then ran three Flink‑Kafka streaming jobs to compare query latency across the three databases under varying traffic rates, revealing Redis as fastest, ClickHouse second, Doris slowest.

ClickHouseDatabase PerformanceDoris

0 likes · 8 min read

ClickHouse vs Doris vs Redis: Real‑World Query Performance Test with Flink

Alibaba Cloud Big Data AI Platform

Feb 2, 2026 · Big Data

How We Built a Scalable Lakehouse Architecture with StarRocks, Paimon, and Flink

This article details the evolution of a data warehouse at RenliJia from a MaxCompute‑centric setup to a modern lakehouse using StarRocks, Paimon, Flink, and Fluss, describing design goals, technical evaluations, implementation steps for offline, OLAP, and real‑time workloads, and the challenges and future plans that emerged.

Big DataData WarehouseFlink

0 likes · 25 min read

How We Built a Scalable Lakehouse Architecture with StarRocks, Paimon, and Flink

ITPUB

Jan 22, 2026 · Backend Development

Sync New MySQL Tables to Doris in Real‑Time with Flink CDC and CdcTools

This article explains how to use Flink CDC together with the CdcTools utility to automatically capture newly created MySQL tables and synchronize both their schema and data to a Doris database in real time, covering the required code, side‑output handling, async execution, and a special delete‑sign field.

Async IOCDCDoris

0 likes · 10 min read

Sync New MySQL Tables to Doris in Real‑Time with Flink CDC and CdcTools

Big Data Tech Team

Dec 25, 2025 · Big Data

How to Build an End‑to‑End E‑Commerce Data Warehouse for Interview Success

This guide walks you through designing and implementing a complete e‑commerce data‑warehouse project—from raw data ingestion and ODS/DWD/DWS/ADS layers to optional real‑time analytics—while highlighting interview‑ready resume tips, common pitfalls, and performance‑tuning tricks.

Big DataETLFlink

0 likes · 10 min read

How to Build an End‑to‑End E‑Commerce Data Warehouse for Interview Success

Java Baker

Dec 22, 2025 · Big Data

Mastering Offline and Real-Time Data Warehouses: A Backend Engineer’s Guide

Backend developers need to understand both offline and real-time data warehouses; this guide explains data collection, layering, partitioning, typical use cases, archiving strategies, and how to build a real-time warehouse with Flink, covering practical steps, examples, and key considerations for efficient data processing.

Data WarehouseFlinkOffline

0 likes · 8 min read

Mastering Offline and Real-Time Data Warehouses: A Backend Engineer’s Guide

DataFunSummit

Dec 10, 2025 · Big Data

How Apache Hudi Powers the Next‑Gen AI‑Native Lakehouse: Insights from the Asia Meetup

The article recaps the Apache Hudi Asia Meetup hosted by JD, covering community updates, JD's data‑lake challenges, the upcoming Hudi 1.1 release, JD's architectural redesign, Kuaishou's real‑time lake adoption, and Huawei Cloud's deep optimizations, all aimed at building an AI‑native, real‑time lakehouse.

AI-nativeApache HudiData Lake

0 likes · 13 min read

How Apache Hudi Powers the Next‑Gen AI‑Native Lakehouse: Insights from the Asia Meetup

dbaplus Community

Dec 8, 2025 · Databases

Which Database Wins IP Range Lookups? ClickHouse vs Doris vs Redis Benchmarks

This article presents a systematic benchmark comparing ClickHouse, Doris, and Redis for IP‑range dimension lookups using Flink‑Kafka pipelines, detailing test design, result table schema, query interfaces, and performance results across varying data rates, concluding that Redis offers the fastest and most stable query latency.

ClickHouseDatabase BenchmarkDoris

0 likes · 7 min read

Which Database Wins IP Range Lookups? ClickHouse vs Doris vs Redis Benchmarks

JD Retail Technology

Dec 1, 2025 · Big Data

How Apache Hudi 1.1 Powers AI‑Native Lakehouse and Real‑Time Data Lakes

The JD‑hosted Apache Hudi Meetup showcased the 1.1 release’s pluggable table format, Flink performance gains, LSM‑Tree MoR redesign, and AI‑native features such as vector indexing, illustrating how the open‑source lakehouse is evolving to meet BI and multimodal AI workloads.

AIApache HudiBig Data

0 likes · 12 min read

How Apache Hudi 1.1 Powers AI‑Native Lakehouse and Real‑Time Data Lakes

Ctrip Technology

Nov 20, 2025 · Big Data

How Ctrip Achieved Minute‑Level Real‑Time Analytics with Flink CDC & Apache Paimon

Ctrip transformed its traditional T+1 offline warehouse into a near‑real‑time lakehouse by integrating Flink CDC with Apache Paimon, designing a two‑stage CDC ingestion, optimizing performance, implementing dynamic updates, and deploying the solution across multiple business scenarios, achieving minute‑level latency, reduced costs, and faster data‑driven decisions.

CDCData EngineeringFlink

0 likes · 27 min read

How Ctrip Achieved Minute‑Level Real‑Time Analytics with Flink CDC & Apache Paimon

Big Data Technology & Architecture

Nov 17, 2025 · Big Data

Flink 2025 Updates: Disaggregated State, AI Agents, and SQL Enhancements

The 2025 Flink release introduces a disaggregated state management architecture for cloud‑native elasticity, AI‑driven Flink Agents with LLM, Memory and Tool support, Delta Join and VARIANT type for semi‑structured data, adaptive batch execution, incremental checkpoints, high‑speed network optimizations, and new SQL and Process Table Functions, reshaping real‑time analytics.

Disaggregated StateFlinkSQL Enhancements

0 likes · 8 min read

Flink 2025 Updates: Disaggregated State, AI Agents, and SQL Enhancements

vivo Internet Technology

Nov 12, 2025 · Big Data

How Vivo Solved Real‑Time Feature Concatenation with RocksDB and Flink

This article explains the evolution of Vivo's real‑time recommendation feature‑concatenation architecture, compares hour‑level, Redis‑streaming and RocksDB state‑backend solutions, and details the memory, performance, startup and HDFS RPC problems encountered along with the concrete fixes applied.

FlinkRocksDBfeature concatenation

0 likes · 21 min read

How Vivo Solved Real‑Time Feature Concatenation with RocksDB and Flink

Instant Consumer Technology Team

Nov 10, 2025 · Big Data

Fixing Multi‑Version, Multi‑Cluster and HA with Apache Kyuubi for Spark/Flink

Apache Kyuubi, an enterprise‑grade multi‑tenant data gateway, replaces Livy and Flink SQL Gateway to support multiple engine versions, cross‑cluster elastic scheduling, high‑availability batch jobs, and traffic control, dramatically reducing deployment complexity, improving resource utilization, and accelerating release cycles for large‑scale Spark and Flink workloads.

Apache KyuubiBig DataData Gateway

0 likes · 18 min read

Fixing Multi‑Version, Multi‑Cluster and HA with Apache Kyuubi for Spark/Flink

Big Data Technology & Architecture

Nov 3, 2025 · Big Data

Taming Small Files in Paimon: Proven Tuning Strategies for Better Performance

This article explains how small‑file issues in Paimon's streaming data lake architecture degrade system stability and query speed, and presents practical parameter‑tuning, table‑level settings, asynchronous compaction, and monitoring techniques to mitigate those problems.

Big DataData LakeFlink

0 likes · 7 min read

Taming Small Files in Paimon: Proven Tuning Strategies for Better Performance

Alibaba Cloud Big Data AI Platform

Oct 24, 2025 · Big Data

How Leapmotor Scaled to 1M Cars with a Real‑Time Flink Data Platform

Leapmotor’s rapid growth to one million production cars drove a shift from daily batch data to minute‑level real‑time analytics, prompting the adoption of Flink as the core engine of a multi‑layered big‑data platform that handles massive IoT signals, supports fault diagnosis, and integrates batch and streaming workloads on the cloud.

AutomotiveBig DataCloud

0 likes · 13 min read

How Leapmotor Scaled to 1M Cars with a Real‑Time Flink Data Platform

Alibaba Cloud Big Data AI Platform

Oct 22, 2025 · Big Data

Li Auto’s Trillion‑Row Real‑Time Car‑Network Analytics Using Hologres + Flink

Li Auto’s data team tackled the explosion of vehicle‑telemetry data—over a trillion rows and millions of signals per second—by redesigning their data foundation with Alibaba Cloud’s Hologres and Flink, achieving sub‑second latency, elastic scaling, high availability, and significant cost reductions across real‑time and offline workloads.

Car TelemetryData PlatformFlink

0 likes · 16 min read

Li Auto’s Trillion‑Row Real‑Time Car‑Network Analytics Using Hologres + Flink

StarRocks

Oct 14, 2025 · Big Data

How Ctrip Scaled UBT Analytics by Migrating from ClickHouse to StarRocks

Ctrip's User Behavior Tracking (UBT) system, handling 30 TB of daily data, moved from ClickHouse to StarRocks' compute‑storage separated architecture, cutting average query latency from 1.4 seconds to 203 ms, halving storage, reducing nodes from 50 to 40, and boosting write throughput to 3 million rows per second.

Big DataClickHouseData Migration

0 likes · 15 min read

How Ctrip Scaled UBT Analytics by Migrating from ClickHouse to StarRocks

Big Data Technology & Architecture

Sep 24, 2025 · Big Data

Avoid These 6 Common Paimon Data Loss Pitfalls in Flink and Spark

Learn the six typical scenarios that cause data loss when writing to Paimon—ranging from checkpoint failures and misconfigured partial‑update mode to incorrect sequence fields, snapshot retention issues, concurrent bucket writes, and outdated Spark versions—and how to prevent each problem.

Big DataCheckpointData loss

0 likes · 5 min read

Avoid These 6 Common Paimon Data Loss Pitfalls in Flink and Spark

StarRocks

Sep 23, 2025 · Databases

How Zepto Scaled Real‑Time Brand Analytics with StarRocks: From Postgres MVP to Sub‑Second Queries

Zepto transformed its brand‑analytics platform from a Postgres MVP into a production‑grade, sub‑second real‑time analytics solution by adopting StarRocks, redesigning its data pipeline with Databricks, Kafka, and Flink, and choosing a storage‑compute architecture that supports massive joins and rapid insights.

DatabricksFlinkOLAP

0 likes · 14 min read

DataFunSummit

Sep 21, 2025 · Big Data

Breaking the CPU Wall: BIGO’s Gluten Engine Accelerates Spark and Flink

When big‑data workloads hit the CPU wall, BIGO’s adoption of the open‑source Gluten project delivers native‑engine execution for Spark and a roadmap for Flink, achieving up to 30% end‑to‑end speedup, 50% memory savings, and a scalable, cost‑effective data processing platform.

Big DataFlinkGluten

0 likes · 16 min read

Breaking the CPU Wall: BIGO’s Gluten Engine Accelerates Spark and Flink

360 Zhihui Cloud Developer

Sep 11, 2025 · Big Data

How Paimon Transforms Membership Data Warehousing: From Legacy Lambda to Real‑Time Lakehouse

This article examines the challenges of a legacy Lambda‑based membership data warehouse, introduces Apache Paimon’s lakehouse architecture and its key features, and showcases three real‑world implementations—partial‑update order wide tables, Bitmap‑based UV counting, and branch‑based data correction—while discussing benefits, remaining challenges, and future directions.

Big DataData LakeData Warehouse

0 likes · 29 min read

How Paimon Transforms Membership Data Warehousing: From Legacy Lambda to Real‑Time Lakehouse

High Availability Architecture

Sep 10, 2025 · Big Data

How Ctrip Business Travel Built a Near‑Real‑Time Lakehouse with Flink CDC & Paimon

This article details Ctrip Business Travel’s implementation of a near‑real‑time data warehouse using Flink CDC and the Paimon lakehouse engine, covering order wide‑table construction, ticket refund alerts, ad attribution, batch‑stream integration, and practical lessons on Partial Update, Aggregation, and Tag‑based incremental processing.

==AggregationBatch-Stream Integration

0 likes · 17 min read

How Ctrip Business Travel Built a Near‑Real‑Time Lakehouse with Flink CDC & Paimon

Alibaba Cloud Big Data AI Platform

Sep 9, 2025 · Big Data

How Lazada Scaled Real‑Time Product Selection with Flink & Hologres

Lazada transformed its e‑commerce product selection by building a unified, real‑time platform on Alibaba Cloud Flink and Hologres, overcoming data silos, freshness delays, and high‑throughput challenges to enable millisecond‑level decisions across six Southeast Asian markets.

Data ArchitectureFlinkHologres

0 likes · 19 min read

How Lazada Scaled Real‑Time Product Selection with Flink & Hologres

Alibaba Cloud Big Data AI Platform

Sep 8, 2025 · Big Data

How Ele.me Revolutionized Real‑Time Data Warehousing with Flink‑Paimon Lakehouse

In this detailed case study, Alibaba’s Ele.me team explains how they evolved from siloed, chimney‑style real‑time warehouses to a unified Flink‑Paimon lakehouse, highlighting the three development stages, technology evaluations, the Alake platform’s one‑stop capabilities, production results, and future directions such as Fluss and AI integration.

AlakeFlinkLakehouse

0 likes · 17 min read

How Ele.me Revolutionized Real‑Time Data Warehousing with Flink‑Paimon Lakehouse

Ctrip Technology

Sep 2, 2025 · Big Data

How Ctrip Built a Near‑Real‑Time Lakehouse with Flink & Paimon

This article details Ctrip Business Travel’s implementation of a near‑real‑time data warehouse and lakehouse using Flink CDC and Apache Paimon, covering order wide‑table construction, automated ticket reminders, ad attribution, batch‑stream integration, and lessons on Partial Update, Aggregation, and Tag‑based incremental processing.

Batch-Stream IntegrationFlinkLakehouse

0 likes · 17 min read

How Ctrip Built a Near‑Real‑Time Lakehouse with Flink & Paimon

Baidu Geek Talk

Sep 1, 2025 · Big Data

How Baidu Netdisk Built a High‑Performance Real‑Time Engine with Flink

This article explains how Baidu Netdisk transitioned from Spark Streaming to a Flink‑based Tiangong real‑time computing engine, detailing the evolution, reasons for choosing Flink, architecture, configuration examples, business use cases, technical challenges, and future platform plans.

Baidu NetdiskBig DataFlink

0 likes · 16 min read

How Baidu Netdisk Built a High‑Performance Real‑Time Engine with Flink

Alibaba Cloud Big Data AI Platform

Aug 31, 2025 · Big Data

Disaggregated Flink State AI Anomaly Detection, Slow‑Query Ranking (VLDB 2025)

At VLDB 2025, three Alibaba Cloud papers were accepted: one introduces a disaggregated state‑management architecture for Flink 2.0 that separates storage from compute, another presents a cross‑contrastive learning framework for unsupervised Flink anomaly detection, and the third proposes a multimodal ranking system for identifying root causes of slow queries in cloud databases.

Cross Contrastive LearningDisaggregated State ManagementFlink

0 likes · 10 min read

Disaggregated Flink State AI Anomaly Detection, Slow‑Query Ranking (VLDB 2025)

php Courses

Aug 29, 2025 · Operations

How to Build a Real‑Time PHP Log Event Pipeline for Instant Insights

Learn how to transform PHP logs into real‑time, structured events by implementing a log event pipeline that includes JSON logging, lightweight collectors like Filebeat, streaming platforms such as Kafka or Flink, enrichment, and visualization with Grafana, enabling instant monitoring, alerting, and data‑driven decisions.

FlinkLog ProcessingObservability

0 likes · 7 min read

How to Build a Real‑Time PHP Log Event Pipeline for Instant Insights

Big Data Tech Team

Aug 25, 2025 · Interview Experience

Essential Big Data Interview Questions for Data Warehouse Engineer Roles

A comprehensive list of interview topics covering self‑introduction, career moves, data‑warehouse design, team building, architecture comparisons, fact‑table classification, common dimensions, performance tuning, and data‑governance for aspiring big‑data engineers.

Big DataData GovernanceFlink

0 likes · 4 min read

Essential Big Data Interview Questions for Data Warehouse Engineer Roles

Alibaba Cloud Big Data AI Platform

Aug 21, 2025 · Big Data

How Hypergryph Built a High‑Performance Real‑Time Analytics Platform with StarRocks

This case study details how Hypergryph leveraged Alibaba Cloud EMR Serverless StarRocks, Flink, and Kafka to replace a ClickHouse data warehouse with a high‑performance, elastic, and easy‑to‑operate real‑time analytics platform that dramatically improved query speed, stability, operational efficiency, and cost for their gaming business.

Cloud ComputingFlinkStarRocks

0 likes · 8 min read

StarRocks

Aug 19, 2025 · Big Data

How Joydata Scaled to 150 Billion Daily Events with StarRocks: A Data Architecture Journey

Facing daily data growth from millions to 150 billion records, Joydata‑U transformed its analytics platform through three architectural stages—Hadoop, Hadoop + Trino, and finally StarRocks—introducing resource isolation, Flat JSON acceleration, and Bitmap indexing to cut query latency by up to seven times and achieve sub‑2‑minute data freshness across BI, ad‑tech, game analytics, and CRM workloads.

Bitmap IndexData ArchitectureFlat JSON

0 likes · 12 min read

How Joydata Scaled to 150 Billion Daily Events with StarRocks: A Data Architecture Journey

JD Retail Technology

Aug 8, 2025 · Big Data

How JD.com Transformed Its Traffic Data Pipeline from Lambda to a Lakehouse Architecture

This article examines JD.com's migration of its massive traffic data processing from a dual Lambda architecture to an integrated lakehouse solution, detailing the challenges, innovative optimizations with Flink and Hudi, performance gains, cost reductions, and future directions for real‑time data handling.

Big DataData EngineeringFlink

0 likes · 10 min read

How JD.com Transformed Its Traffic Data Pipeline from Lambda to a Lakehouse Architecture

Alibaba Cloud Big Data AI Platform

Aug 7, 2025 · Big Data

How Flink ML Transforms Intelligent Operations: Real‑Time Anomaly Detection, Forecasting & Log Clustering

This article explains how Alibaba Cloud’s big‑data platform leverages Flink ML to build an intelligent‑operations service that tackles stability, cost and efficiency challenges through time‑series anomaly detection, forecasting and streaming log‑clustering, dramatically reducing latency, complexity and operational overhead.

FlinkIntelligent OperationsLog Clustering

0 likes · 25 min read

How Flink ML Transforms Intelligent Operations: Real‑Time Anomaly Detection, Forecasting & Log Clustering

58 Tech

Aug 7, 2025 · Big Data

Transform Real‑Time Data Warehousing with Paimon: From Flink ROW_NUMBER to Streaming Lakehouse

This article details how a real‑time data warehouse built on Flink, Kafka, HBase and MySQL was redesigned using Paimon to eliminate costly deduplication, handle out‑of‑order events, enable streaming reads, simplify aggregation, replace multiple lookup sources, and achieve faster, more reliable batch repairs, resulting in major resource and operational gains.

Data WarehouseFlinkLakehouse

0 likes · 24 min read

Transform Real‑Time Data Warehousing with Paimon: From Flink ROW_NUMBER to Streaming Lakehouse

iQIYI Technical Product Team

Aug 7, 2025 · Big Data

Building a Low‑Latency, High‑Capacity Real‑Time Data Platform for Finance

Facing growing data demands in finance, we replaced two legacy synchronization pipelines with a unified, low‑latency architecture using BabelX Real‑Time, Flink CDC, Iceberg v2 and Paimon, achieving minute‑level data freshness, ten‑to‑thirty‑fold query speedups, reduced storage costs, and streamlined schema management across multiple business units.

Big DataFlinkIceberg

0 likes · 12 min read

Building a Low‑Latency, High‑Capacity Real‑Time Data Platform for Finance

Big Data Technology & Architecture

Jul 29, 2025 · Big Data

What Interviewers Really Ask About Flink, Data Consistency, and Warehouse Design

An interviewee recounts a challenging first interview that focused on Flink resource configuration, late data handling, and offline data warehouse design, and shares practical advice on attitude, thorough preparation, emphasizing real project storytelling, and post‑interview review to continuously improve performance.

Data ConsistencyData WarehouseFlink

0 likes · 4 min read

What Interviewers Really Ask About Flink, Data Consistency, and Warehouse Design

Alibaba Cloud Big Data AI Platform

Jul 25, 2025 · Big Data

Cross-Contrastive Learning Cuts Flink Anomaly Detection Errors by 12%

The paper “Noise Matters: Cross Contrastive Learning for Flink Anomaly Detection”, accepted at VLDB 2025, introduces a novel cross‑contrastive method that leverages attention‑based representations and a boundary‑aware loss to detect Flink‑specific hotspot anomalies, achieving a 12.1% F1 improvement over state‑of‑the‑art techniques.

Big DataCross-Contrastive LearningFlink

0 likes · 6 min read

Cross-Contrastive Learning Cuts Flink Anomaly Detection Errors by 12%

Big Data Tech Team

Jul 23, 2025 · Big Data

From Beginner to Data Warehouse Architect: A Complete Roadmap

This guide walks you through every essential topic—from data warehouse architecture and layering, through ETL, OLAP, Hadoop, and Flink, to visualization tools, learning paths, recommended resources, and the management skills needed to become a proficient data warehouse architect.

Data WarehouseETLFlink

0 likes · 9 min read

From Beginner to Data Warehouse Architect: A Complete Roadmap

Big Data Technology & Architecture

Jul 23, 2025 · Big Data

What’s New in Apache Flink 2.0? Key Features and Cloud‑Native Upgrades for 2025

This article summarizes the major Apache Flink 2.0 updates released in the first half of 2025, covering architecture separation, cloud‑native deployment, AI‑driven agents, SQL enhancements, data integration, operational tools, and performance optimizations for real‑time intelligent computing.

AI integrationBig DataCloud Native

0 likes · 10 min read

What’s New in Apache Flink 2.0? Key Features and Cloud‑Native Upgrades for 2025

Big Data Technology & Architecture

Jul 21, 2025 · Big Data

Essential Data Lake Interview Questions: Flink, Hudi, Row_Number, and Best Practices

This article reviews common data lake interview questions—covering problem definition, Flink-to-Hudi row_number deduplication, retract streams, pipeline architecture optimizations, and read/write best practices—providing concise explanations and practical insights for candidates.

Big Data InterviewData LakeFlink

0 likes · 7 min read

Essential Data Lake Interview Questions: Flink, Hudi, Row_Number, and Best Practices

Big Data Technology & Architecture

Jul 16, 2025 · Big Data

Master Flink Optimizations: TTL, Mini‑Batch, Two‑Phase Aggregation, Lookup Join & More

This article reviews the most effective Flink optimization techniques since 2022, including operator‑level TTL, mini‑batch processing, two‑phase aggregation, multi‑dimensional DISTINCT with FILTER, lookup join caching strategies, and TopN implementations, each rated with recommendation stars for production use.

Big DataFlinkLookup Join

0 likes · 8 min read

Master Flink Optimizations: TTL, Mini‑Batch, Two‑Phase Aggregation, Lookup Join & More

DataFunSummit

Jul 12, 2025 · Big Data

How Fluss Unifies Stream and Lake to Power AI Data Pipelines

In the era of rapid AI growth, Fluss offers a unified lake‑stream architecture that tackles data quality, timeliness, scale, and multimodal challenges by tightly integrating Flink streaming with a high‑performance data lake, enabling seamless real‑time and batch analytics for AI workloads.

AIData LakeFlink

0 likes · 12 min read

How Fluss Unifies Stream and Lake to Power AI Data Pipelines

StarRocks

Jul 9, 2025 · Big Data

How Shopee Built a Near‑Real‑Time Data Warehouse with Paimon and StarRocks

Shopee combined the Paimon data lake with StarRocks and Flink to create a quasi‑real‑time warehouse, enabling fast task diagnostics and a high‑performance financial reconciliation system while dramatically reducing storage costs and latency through innovative ODS, snapshot, and branch table techniques.

FlinkPaimonReal-Time Data Warehouse

0 likes · 13 min read

How Shopee Built a Near‑Real‑Time Data Warehouse with Paimon and StarRocks

Big Data Technology & Architecture

Jul 8, 2025 · Big Data

Flink’s AI Agents and Disaggregated State: Transforming Big Data

The article reviews key topics from the FFA2025 Singapore conference, highlighting Flink’s new AI‑focused Agents framework, the breakthrough Flink 2.0 disaggregated state architecture, emerging lake storage solutions like Paimon, and the Fluss streaming table store, illustrating how big‑data platforms are evolving for AI workloads.

AI AgentsBig DataData Lake

0 likes · 6 min read

Flink’s AI Agents and Disaggregated State: Transforming Big Data

StarRocks

Jul 1, 2025 · Big Data

How StarRocks Boosted Suixingfu’s Real‑Time Data Platform: 3× Faster Queries & 10× Faster Analytics

Suixingfu rebuilt its payment data pipeline by replacing a fragmented Lambda stack with a unified Porter CDC + StarRocks + Elasticsearch architecture, achieving three‑fold query speed, ten‑fold analytics efficiency, 20% storage reduction, and sub‑second data‑capture latency across high‑concurrency, ad‑hoc, and batch workloads.

CDCData WarehouseFlink

0 likes · 14 min read

How StarRocks Boosted Suixingfu’s Real‑Time Data Platform: 3× Faster Queries & 10× Faster Analytics

DataFunSummit

Jun 19, 2025 · Big Data

How Shopee Leverages Paimon for Real‑Time Data Warehousing and Task Diagnosis

This article details Shopee's Data Infra team's use of the Paimon data lake to build near‑real‑time warehouses, accelerate ODS layers, implement a task‑diagnosis system, and create a reconciliation platform, while sharing future plans and a Q&A session.

Data LakeFlinkPaimon

0 likes · 12 min read

How Shopee Leverages Paimon for Real‑Time Data Warehousing and Task Diagnosis

Big Data Technology & Architecture

Jun 11, 2025 · Big Data

How to Solve Common Paimon Performance Issues in Flink: Small Files, OOM, and More

This article compiles frequent problems encountered when using Paimon with Flink—such as small‑file generation, write‑performance bottlenecks, OOM/GC issues, file‑deletion conflicts, dimension‑table join slowness, and snapshot expiration—and provides practical configuration and optimization solutions.

Big DataFlinkOptimization

0 likes · 9 min read

How to Solve Common Paimon Performance Issues in Flink: Small Files, OOM, and More

JD Retail Technology

Jun 10, 2025 · Artificial Intelligence

How JD Builds a Scalable AI‑Powered Recommendation Data System with Flink

This article explains JD's complex recommendation system data pipeline—from indexing, sampling, and feature engineering to explainability and real‑time metrics—highlighting challenges such as data consistency, latency, and the use of Flink for massive, low‑latency processing.

FlinkReal-time Dataexplainability

0 likes · 23 min read

How JD Builds a Scalable AI‑Powered Recommendation Data System with Flink

Big Data Technology & Architecture

Jun 5, 2025 · Big Data

Flink Web UI Monitoring and End‑to‑End Latency Implementation Guide

This article explains the key monitoring items of the Flink Web UI, details task topology, operator and system metrics, checkpoint and log inspection, and provides two practical solutions—custom metrics and distributed tracing—to measure and visualize full‑chain latency in Flink jobs.

Big DataDistributed TracingFlink

0 likes · 10 min read

Flink Web UI Monitoring and End‑to‑End Latency Implementation Guide

Alibaba Cloud Big Data AI Platform

May 21, 2025 · Big Data

How Alibaba’s A+ Traffic Analysis Achieved Sub‑Second Log Queries with StarRocks & Paimon

This article details how Alibaba's A+ traffic analysis platform tackled trillion‑row log ingestion and high‑concurrency queries by redesigning storage with Paimon, leveraging Flink for real‑time ingestion, and using StarRocks for fast lake analytics, ultimately reducing query latency from minutes to seconds.

FlinkLog AnalyticsPaimon

0 likes · 15 min read

How Alibaba’s A+ Traffic Analysis Achieved Sub‑Second Log Queries with StarRocks & Paimon

Big Data Technology & Architecture

May 21, 2025 · Big Data

Interview Experience: Flink Task Resource Allocation, Issues, and Monitoring

This article shares an interviewee's experience discussing core Flink interview questions, including typical resource allocation for large online tasks, common problems such as data, performance, stability, and resource issues, and the monitoring practices for clusters and tasks, while also containing a brief self‑promotion.

Big DataFlinkMonitoring

0 likes · 7 min read

Interview Experience: Flink Task Resource Allocation, Issues, and Monitoring

Xiaohongshu Tech REDtech

May 19, 2025 · Industry Insights

How Xiaohongshu Built a Minute‑Level Near‑Real‑Time Data Warehouse with Incremental Computing

Facing billions of daily logs and the need for minute‑level experiment metrics, Xiaohongshu partnered with Yunqi Tech to design a generic incremental‑compute solution that delivers near‑real‑time data warehousing with lower cost, higher accuracy, simplified pipelines, and improved query performance.

Big DataData LakeFlink

0 likes · 24 min read

How Xiaohongshu Built a Minute‑Level Near‑Real‑Time Data Warehouse with Incremental Computing

Selected Java Interview Questions

May 15, 2025 · Backend Development

Six Common Approaches to Synchronize MySQL Data to Elasticsearch

This article reviews six mainstream solutions for keeping MySQL and Elasticsearch in sync—including synchronous double‑write, asynchronous MQ‑based double‑write, Logstash polling, Canal binlog listening, DataX batch migration, and Flink stream processing—detailing their scenarios, advantages, drawbacks, and practical code examples to guide optimal technical selection.

CanalData synchronizationElasticsearch

0 likes · 8 min read

Six Common Approaches to Synchronize MySQL Data to Elasticsearch

Big Data Technology & Architecture

May 15, 2025 · Big Data

Interview Review: Spark Stage Logic, Data Warehouse Evaluation, and Flink Late‑Data Handling

This article reviews common interview questions for data development roles, covering Spark stage partitioning and optimization, criteria for evaluating data warehouses, Flink's handling of late data, and provides practical answers and resources to help candidates deliver standout responses.

Big DataData QualityData Warehouse

0 likes · 11 min read

Interview Review: Spark Stage Logic, Data Warehouse Evaluation, and Flink Late‑Data Handling

Huolala Tech

May 14, 2025 · Big Data

How Lalamove Scaled Real‑Time Data Warehousing with Flink and Paimon

Lalamove’s international logistics platform transformed its real‑time data warehouse by leveraging Apache Flink and the Paimon lakehouse, addressing challenges of multi‑region data centers, time‑zone diversity, frequent upstream changes, and high costs, while improving scalability, latency, and operational efficiency across global markets.

Big DataFlinkPaimon

0 likes · 13 min read

How Lalamove Scaled Real‑Time Data Warehousing with Flink and Paimon

Su San Talks Tech

May 5, 2025 · Big Data

6 Proven Ways to Sync MySQL Data to Elasticsearch – Choose the Right Strategy

This article compares six mainstream MySQL‑to‑Elasticsearch synchronization methods—synchronous double‑write, asynchronous MQ, Logstash polling, Canal binlog listening, DataX batch sync, and Flink streaming—detailing scenarios, code samples, advantages, drawbacks, and practical selection guidance for developers.

CanalData synchronizationElasticsearch

0 likes · 9 min read

6 Proven Ways to Sync MySQL Data to Elasticsearch – Choose the Right Strategy

Big Data Technology & Architecture

Apr 28, 2025 · Big Data

Interview Insights on Spark Optimization, Flink Exactly-Once Semantics, and Paimon Asynchronous Merging

This article shares three high‑quality interview questions from a JD big‑data interview, covering practical Spark tuning, Flink's exactly‑once guarantees in production, and Paimon's asynchronous merge mechanism, and explains how to answer them with real‑world scenarios.

Big DataFlinkPaimon

0 likes · 6 min read

Interview Insights on Spark Optimization, Flink Exactly-Once Semantics, and Paimon Asynchronous Merging

Bilibili Tech

Apr 8, 2025 · Big Data

Building a Real-Time Data Warehouse for B站 Game Business

To meet Bilibili’s rapidly expanding game business, the team built a unified real-time data warehouse using Hologres and Flink that replaces the traditional Lambda stack, delivering high-throughput writes, low-latency processing, seamless offline-online integration, global deployment, and real-time support for operations, advertising, and risk analytics.

Big Data ArchitectureData architecture case studyFlink

0 likes · 17 min read

Building a Real-Time Data Warehouse for B站 Game Business

DataFunSummit

Apr 3, 2025 · Big Data

Apache Hudi Asia Technical Salon Highlights: Practices and Innovations from Kuaishou, Meituan, Douyin, Huawei, and JD

The Apache Hudi Asia technical salon held in Beijing on March 29 gathered over 230 on‑site participants and 16,000 online viewers, featuring expert talks from leading Chinese tech companies that showcased real‑world Hudi implementations, performance optimizations, and future roadmap for data‑lake technologies.

Apache HudiBig DataData Lake

0 likes · 13 min read

Apache Hudi Asia Technical Salon Highlights: Practices and Innovations from Kuaishou, Meituan, Douyin, Huawei, and JD

DataFunSummit

Apr 1, 2025 · Big Data

Understanding Flink CDC 3.3: Features, Improvements, and Future Plans

This article provides a comprehensive overview of Flink CDC 3.3, detailing its CDC fundamentals, new connectors, Transform module enhancements, asynchronous snapshot splitting, community adoption, and upcoming roadmap for broader ecosystem support and batch‑mode execution.

Big DataCDCChange Data Capture

0 likes · 15 min read

Understanding Flink CDC 3.3: Features, Improvements, and Future Plans

iQIYI Technical Product Team

Mar 27, 2025 · Big Data

Cost‑Effective Real‑Time Data Warehouse 2.0: Migrating from Kafka to Iceberg

iQIYI transformed its real‑time data warehouse by replacing a costly Kafka‑based Lambda stack with a unified stream‑batch Iceberg lake, cutting storage expenses by 90%, halving compute costs, extending data retention, and delivering minute‑level freshness for 90% of use cases while preserving second‑level processing where needed.

FlinkIcebergReal-Time Data Warehouse

0 likes · 11 min read

Cost‑Effective Real‑Time Data Warehouse 2.0: Migrating from Kafka to Iceberg

Big Data Tech Team

Mar 25, 2025 · Big Data

How Apache Paimon Transforms Real‑Time Lakehouse Architecture

This article analyzes the limitations of a traditional Flink + Talos + Iceberg real‑time lakehouse, introduces Apache Paimon's lakehouse table format and LSM storage, and demonstrates three practical use cases—partial‑update widening, streaming upsert, and lookup join—showing cost, stability, and performance improvements while outlining future roadmap.

Apache PaimonFlinkLakehouse

0 likes · 16 min read

How Apache Paimon Transforms Real‑Time Lakehouse Architecture

AntData

Mar 20, 2025 · Big Data

Design and Optimization of Real‑time Data Lake Tables with Paimon and Flink for Advertising Diagnostics

This article presents a comprehensive exploration of using Apache Paimon and Flink to design lake tables that support minute‑level latency, low cost, and unified batch‑stream processing for advertising data, covering schema design, partitioning strategies, performance trade‑offs, cost analysis, and operational best practices.

Big DataData LakeFlink

0 likes · 34 min read

Design and Optimization of Real‑time Data Lake Tables with Paimon and Flink for Advertising Diagnostics

Alibaba Cloud Big Data AI Platform

Mar 18, 2025 · Big Data

Boosting Flink CDC to Hologres: High‑Performance Data Sync Optimization Techniques

This article presents a comprehensive overview of Flink CDC + Hologres high‑performance data synchronization, detailing write and consumption optimizations, architectural principles, and future directions to achieve low latency and high throughput in real‑time data pipelines.

CDCFlinkHologres

0 likes · 21 min read

Boosting Flink CDC to Hologres: High‑Performance Data Sync Optimization Techniques

Big Data Technology & Architecture

Mar 17, 2025 · Big Data

Lakehouse Implementations at Leading Companies: Challenges, Solutions, and Benefits

This article reviews how major tech firms such as Alibaba, Tencent, ByteDance, and Kuaishou tackled lakehouse challenges—including architecture fragmentation, cost, scalability, and complex multimodal data—by adopting real‑time lakehouse solutions like Flink + Paimon, Iceberg + StarRocks, Hudi + LAS, and Doris + Alluxio, and outlines the resulting performance and cost gains.

DorisFlinkLakehouse

0 likes · 9 min read

Lakehouse Implementations at Leading Companies: Challenges, Solutions, and Benefits

Alimama Tech

Mar 12, 2025 · Big Data

Design and Evolution of Alibaba Advertising Real-Time Data Warehouse

Alibaba Mama’s advertising platform migrated from a monolithic Flink‑Kafka pipeline to a layered Paimon lakehouse, adding DWS upsert support and multi‑layer storage, which delivers minute‑level data freshness, cuts latency by 2.5 hours, reduces resource use over 40 %, halves development effort and achieves ≥99.9 % availability.

AdvertisingAlibabaData Lake

0 likes · 18 min read

Design and Evolution of Alibaba Advertising Real-Time Data Warehouse

Baidu Tech Salon

Mar 6, 2025 · Big Data

Real-Time Anti-Fraud Streaming System Based on Flink: Architecture, Challenges, and Optimizations

The article describes a Flink‑based real‑time anti‑fraud streaming system that combines a risk‑control platform, configurable YAML‑driven pipelines, and optimized state handling—using early event‑time triggers, micro‑batch caching, and coarse‑grained key reduction—to compute multi‑dimensional features, support rapid strategy updates, simulation filtering, and seamless output to ClickHouse, Hive, and Redis for both instant monitoring and offline analysis.

ConfigurationFlinkReal-time Streaming

0 likes · 26 min read

Real-Time Anti-Fraud Streaming System Based on Flink: Architecture, Challenges, and Optimizations

Baidu Geek Talk

Mar 3, 2025 · Big Data

Real-Time Anti-Cheat Streaming System Based on Flink: Architecture, Challenges, and Solutions

The article details a Flink‑based real‑time anti‑cheat streaming architecture that combines tumbling, sliding and session windows with early triggers, batch state updates cached in memory, coarse‑grained key reduction, and YAML‑driven strategy configuration to deliver millisecond‑level detection, seamless integration with ClickHouse, Hive, Redis and message queues, and self‑service analytics, achieving high throughput, low latency, and robust stability for large‑scale risk control.

FlinkPerformance OptimizationReal-time Streaming

0 likes · 25 min read

Real-Time Anti-Cheat Streaming System Based on Flink: Architecture, Challenges, and Solutions

Big Data Technology & Architecture

Mar 3, 2025 · Big Data

The Turning Point for Data Development: From Traditional Data Engineering to AI Data Engineering

The article analyzes how the rapid rise of open‑source large‑model AI in 2025 is reshaping the data development profession, urging developers to transition from specialized data‑engineer roles to full‑stack AI data engineering skills such as distributed computing, lake‑house architectures, and model tuning.

AIBig DataData Engineering

0 likes · 7 min read

The Turning Point for Data Development: From Traditional Data Engineering to AI Data Engineering

DataFunSummit

Mar 2, 2025 · Artificial Intelligence

Lightweight Algorithm Service Architecture Based on Offline Tag Knowledge Base and Real‑time Data Warehouse

This article presents a lightweight algorithm service solution that combines an offline pre‑computed tag knowledge base with a real‑time data warehouse using Flink, Doris, Hive SQL and Python to achieve short development cycles, agile iteration, low cost, and scalable deployment for classification and clustering tasks.

DorisFlinkalgorithm service

0 likes · 16 min read

Lightweight Algorithm Service Architecture Based on Offline Tag Knowledge Base and Real‑time Data Warehouse

Big Data Technology Architecture

Mar 1, 2025 · Big Data

Core Principles and Practical Guide to Flink CDC

This article explains CDC fundamentals, details Flink CDC's architecture and advantages, provides setup steps, code examples for SQL and DataStream APIs, discusses performance tuning, consistency, common issues, and typical real‑time data integration scenarios.

CDCChange Data CaptureDebezium

0 likes · 7 min read

Core Principles and Practical Guide to Flink CDC

StarRocks

Feb 27, 2025 · Big Data

How iQIYI Boosted Ad Query Performance 400% with StarRocks – A Deep Dive into OLAP Evolution

This article details iQIYI's transition from Impala+Kudu and ClickHouse to StarRocks, describing the OLAP architecture, performance gains of up to 400% in advertising workloads, the technical challenges of data consistency, lake‑warehouse fusion, operational scaling, and the step‑by‑step migration process using a dual‑run platform.

ClickHouseFlinkOLAP

0 likes · 15 min read

How iQIYI Boosted Ad Query Performance 400% with StarRocks – A Deep Dive into OLAP Evolution

Alibaba Cloud Big Data AI Platform

Feb 20, 2025 · Big Data

How Flink Powers Real-Time Variable Pools for FinTech Risk Assessment

This article details how a fintech company leveraged Apache Flink to build a real-time variable pool, covering architecture choices, development efficiency improvements, multi‑stream association optimizations, and operational monitoring, while also discussing future migration to cloud‑native OLAP solutions.

Big DataFinTechFlink

0 likes · 10 min read

How Flink Powers Real-Time Variable Pools for FinTech Risk Assessment

Alibaba Cloud Big Data AI Platform

Feb 13, 2025 · Big Data

From Lambda to Lakehouse: Evolution of Real‑Time Data Warehouses with Hologres & Flink

This article traces the three‑generation evolution of real‑time data warehouses—from the Lambda architecture to a lakehouse approach—detailing how Hologres, Flink, and Dynamic Table technologies enable unified storage, multi‑mode computing, serverless execution, and high‑performance analytics in modern big‑data environments.

Dynamic TableFlinkHologres

0 likes · 15 min read

From Lambda to Lakehouse: Evolution of Real‑Time Data Warehouses with Hologres & Flink

Alibaba Cloud Big Data AI Platform

Jan 27, 2025 · Big Data

Unlock Real-Time Data Sync with Flink CDC: YAML Integration, Transform & Route Explained

This article summarizes an advanced Flink CDC presentation, covering Flink CDC fundamentals, real‑time Flink integration, CDC‑YAML core capabilities, supported sync links, Transform and Route modules, monitoring metrics, schema‑change strategies, typical use cases, performance optimizations, demo implementations, and future development plans.

CDCData IntegrationFlink

0 likes · 20 min read

Unlock Real-Time Data Sync with Flink CDC: YAML Integration, Transform & Route Explained

Alibaba Cloud Big Data AI Platform

Jan 21, 2025 · Big Data

Master Flink CDC YAML: Real‑Time Data Integration Best Practices in 10 Minutes

This article introduces Flink CDC YAML, outlines its core capabilities and application scenarios, compares it with SQL and DataStream jobs, showcases enterprise‑grade features of Alibaba Cloud Flink CDC, and provides a step‑by‑step tutorial to build a complete CDC YAML job in just ten minutes.

CDCData IntegrationFlink

0 likes · 20 min read

Master Flink CDC YAML: Real‑Time Data Integration Best Practices in 10 Minutes