Disaggregated Flink State AI Anomaly Detection, Slow‑Query Ranking (VLDB 2025)
At VLDB 2025, three Alibaba Cloud papers were accepted: one introduces a disaggregated state‑management architecture for Flink 2.0 that separates storage from compute, another presents a cross‑contrastive learning framework for unsupervised Flink anomaly detection, and the third proposes a multimodal ranking system for identifying root causes of slow queries in cloud databases.
Disaggregated State Management in Apache Flink® 2.0
The paper proposes a novel disaggregated state‑management architecture that decouples state storage from compute resources, leveraging low‑cost object storage for state sharing and persistence. This design dramatically reduces snapshot overhead, speeds up state recovery, and lowers resource coupling, marking a major step toward cloud‑native Flink.
The solution includes two core innovations: a unified asynchronous execution framework that enables non‑blocking state access while preserving Exactly‑Once semantics and full compatibility with Flink 1.x, and a custom state storage engine called ForSt that unifies local and remote state via an LSM‑tree abstraction, delivering second‑level snapshots and instant recovery.
Future work aims to extend ForSt with batch‑compute push‑down to further cut streaming costs and broaden real‑time computing capabilities.
Title: Disaggregated State Management in Apache Flink® 2.0
Authors: Yuan Mei, Zhaoqian Lan, Lei Huang, Yanfei Lei, Han Yin, Rui Xia, Kaitian Hu, Paris Carbone, Vasiliki Kalavri, Feng Wang
URL: https://www.vldb.org/pvldb/vol18/p4846-mei.pdf
Noise Matters: Cross Contrastive Learning for Flink Anomaly Detection
This work addresses the “hot‑spot” problem in large‑scale Flink clusters by introducing an unsupervised anomaly detection framework based on cross‑contrastive learning. The method learns global and local representations of time‑series data via attention mechanisms and amplifies distances for slowly rising anomalies, improving detection of complex patterns.
It also proposes a boundary‑aware loss that uses normalized observation scores as priors to limit loss reduction on suspected anomalies, mitigating the impact of noisy training data. Integrated into Flink’s intelligent inspection system, the technique enhances early risk detection.
Title: Noise Matters: Cross Contrastive Learning for Flink Anomaly Detection
Authors: Zhihao Zhuang, Yingying Zhang, Kai Zhao, Chenjuan Guo, Bin Yang, Qingsong Wen, Lunting Fan
URL: https://www.vldb.org/pvldb/vol18/p1159-zhuang.pdf
Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems
The paper presents the RCRank framework for automatically identifying and ranking the root causes of slow queries in the Hologres cloud database. By fusing four modalities—SQL text, execution plans, logs, and performance metrics—through a pretrained encoder and cross‑modal feature extractor, the system quantifies the impact of each root cause and produces an interpretable importance ranking.
RCRank combines rule‑based analysis with large‑language‑model assistance to construct a labeled dataset where each root cause is assigned a real impact score based on execution‑time reduction after remediation. The approach improves slow‑query optimization efficiency by about 14% over state‑of‑the‑art methods.
Title: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems
Authors: Biao Ouyang, Yingying Zhan, Hanyin Cheng, Yang Shu, Chenjuan Guo, Bin Yang, Qingsong Wen, Lunting Fan, Christian S. Jensen
URL: https://www.vldb.org/pvldb/vol18/p1169-ouyang.pdf
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
