Tagged articles
3 articles
Page 1 of 1
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 26, 2024 · Cloud Computing

How Neural Attention Detects Cluster-Wide Task Slowdowns in Cloud Systems

A new paper accepted at ACM SIGKDD2024 presents a neural‑network‑based framework that uses a skim‑attention mechanism and a picky loss function to accurately detect cluster‑wide task slowdown anomalies in large‑scale cloud platforms, achieving a 5.3% average F1‑score improvement over state‑of‑the‑art methods.

Cluster PerformanceNeural Networksanomaly detection
0 likes · 5 min read
How Neural Attention Detects Cluster-Wide Task Slowdowns in Cloud Systems
Architect
Architect
Feb 18, 2022 · Cloud Native

Large‑Scale etcd Cluster Performance Optimization and Pod Data Splitting in Ant Group’s Sigma

This article describes how Ant Group tackled the performance ceiling of its massive Sigma Kubernetes clusters by horizontally splitting etcd storage for Pods, Leases and Events, redesigning watch handling to avoid component restarts, and using snapshot‑based migration to preserve data integrity while reducing latency.

Cluster PerformanceData MigrationKubernetes
0 likes · 27 min read
Large‑Scale etcd Cluster Performance Optimization and Pod Data Splitting in Ant Group’s Sigma