Big Data 10 min read

Disaggregated Flink State AI Anomaly Detection, Slow‑Query Ranking (VLDB 2025)

At VLDB 2025, three Alibaba Cloud papers were accepted: one introduces a disaggregated state‑management architecture for Flink 2.0 that separates storage from compute, another presents a cross‑contrastive learning framework for unsupervised Flink anomaly detection, and the third proposes a multimodal ranking system for identifying root causes of slow queries in cloud databases.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Disaggregated Flink State AI Anomaly Detection, Slow‑Query Ranking (VLDB 2025)

Disaggregated State Management in Apache Flink® 2.0

The paper proposes a novel disaggregated state‑management architecture that decouples state storage from compute resources, leveraging low‑cost object storage for state sharing and persistence. This design dramatically reduces snapshot overhead, speeds up state recovery, and lowers resource coupling, marking a major step toward cloud‑native Flink.

The solution includes two core innovations: a unified asynchronous execution framework that enables non‑blocking state access while preserving Exactly‑Once semantics and full compatibility with Flink 1.x, and a custom state storage engine called ForSt that unifies local and remote state via an LSM‑tree abstraction, delivering second‑level snapshots and instant recovery.

Future work aims to extend ForSt with batch‑compute push‑down to further cut streaming costs and broaden real‑time computing capabilities.

Title: Disaggregated State Management in Apache Flink® 2.0

Authors: Yuan Mei, Zhaoqian Lan, Lei Huang, Yanfei Lei, Han Yin, Rui Xia, Kaitian Hu, Paris Carbone, Vasiliki Kalavri, Feng Wang

URL: https://www.vldb.org/pvldb/vol18/p4846-mei.pdf

Flink 2.0 disaggregated state architecture diagram
Flink 2.0 disaggregated state architecture diagram

Noise Matters: Cross Contrastive Learning for Flink Anomaly Detection

This work addresses the “hot‑spot” problem in large‑scale Flink clusters by introducing an unsupervised anomaly detection framework based on cross‑contrastive learning. The method learns global and local representations of time‑series data via attention mechanisms and amplifies distances for slowly rising anomalies, improving detection of complex patterns.

It also proposes a boundary‑aware loss that uses normalized observation scores as priors to limit loss reduction on suspected anomalies, mitigating the impact of noisy training data. Integrated into Flink’s intelligent inspection system, the technique enhances early risk detection.

Title: Noise Matters: Cross Contrastive Learning for Flink Anomaly Detection

Authors: Zhihao Zhuang, Yingying Zhang, Kai Zhao, Chenjuan Guo, Bin Yang, Qingsong Wen, Lunting Fan

URL: https://www.vldb.org/pvldb/vol18/p1159-zhuang.pdf

Noise Matters technical architecture diagram
Noise Matters technical architecture diagram

Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems

The paper presents the RCRank framework for automatically identifying and ranking the root causes of slow queries in the Hologres cloud database. By fusing four modalities—SQL text, execution plans, logs, and performance metrics—through a pretrained encoder and cross‑modal feature extractor, the system quantifies the impact of each root cause and produces an interpretable importance ranking.

RCRank combines rule‑based analysis with large‑language‑model assistance to construct a labeled dataset where each root cause is assigned a real impact score based on execution‑time reduction after remediation. The approach improves slow‑query optimization efficiency by about 14% over state‑of‑the‑art methods.

Title: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems

Authors: Biao Ouyang, Yingying Zhan, Hanyin Cheng, Yang Shu, Chenjuan Guo, Bin Yang, Qingsong Wen, Lunting Fan, Christian S. Jensen

URL: https://www.vldb.org/pvldb/vol18/p1169-ouyang.pdf

RCRank technical architecture diagram
RCRank technical architecture diagram
FlinkCross Contrastive LearningDisaggregated State ManagementSlow Query RankingVLDB 2025
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.