Tag

data center reliability

1 views collected around this technical thread.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 24, 2021 · Cloud Computing

LIBRA and CARE: Memory Bandwidth Management and Fault‑Tolerance Innovations Presented at HPCA 2021

The article reviews two HPCA 2021 papers from Alibaba Cloud—LIBRA, a dynamic memory‑bandwidth management framework that boosts data‑center utilization, and CARE, a cache‑based fault‑tolerance architecture that delivers near‑Chipkill reliability with minimal overhead—while also highlighting future research directions in ML systems, quantum computing, and cache computing.

HPCA2021cloud computingdata center reliability
0 likes · 4 min read
LIBRA and CARE: Memory Bandwidth Management and Fault‑Tolerance Innovations Presented at HPCA 2021
Efficient Ops
Efficient Ops
Aug 9, 2017 · Artificial Intelligence

Can AI Predict Disk Failures? RGF + Transfer Learning for Reliable Data Centers

This article reviews a KDD 2016 study that combines the Regularized Greedy Forest algorithm with transfer learning to accurately predict hard‑disk failures in data centers, addressing challenges like irrelevant SMART attributes, imbalanced data, and model portability across disk models.

Machine LearningRGF algorithmSMART attributes
0 likes · 12 min read
Can AI Predict Disk Failures? RGF + Transfer Learning for Reliable Data Centers