Artificial Intelligence 10 min read

Wear-Updated Integrated Feature Ranking (WEFR) for Robust SSD Failure Prediction

The article presents a large‑scale study of SSD failure prediction using SMART logs from multiple vendors, introduces the Wear‑Updated Integrated Feature Ranking (WEFR) method to automatically and robustly select predictive features, and demonstrates its effectiveness through extensive experiments on real‑world data.

Architects' Tech Alliance

Jul 11, 2023

Wear-Updated Integrated Feature Ranking (WEFR) for Robust SSD Failure Prediction

In a follow‑up to a previous study on SSD operational characteristics, this article focuses on predicting SSD failures, which are critical for large‑scale deployments because they can cause system‑wide outages.

The authors collected two years of SMART logs and failure tickets from five Alibaba data centers, covering six SSD models from three vendors (MA, MB, MC) and totaling nearly 500 K SSDs and 7 K failure records.

SSD failure prediction is framed as an offline classification problem: using raw and normalized SMART attributes as features to predict whether an SSD will fail within a future window (e.g., 30 days). Positive samples correspond to failing SSDs, negative samples to healthy ones.

The paper evaluates five state‑of‑the‑art feature‑selection methods—Pearson correlation, Spearman correlation, J‑index, Random Forest importance, and XGBoost importance—and observes that different methods rank features differently, raising questions about the most effective approach and the optimal number of features.

To address these challenges, the authors propose Wear‑Updated Integrated Feature Ranking (WEFR), which combines multiple feature‑ranking results, removes biased rankings, averages ranks to obtain a final order, automatically determines the number of selected features, and updates the selection according to changes in the wear‑level indicator (MWIN).

Experimental results show that WEFR improves prediction F0.5‑score by up to 22 % compared with using all features, outperforms each individual feature‑selection method, and benefits from automatic and wear‑aware updates, especially for SSDs in low‑wear stages.

The study confirms that robust feature selection and wear‑level‑aware updates are essential for accurate SSD failure prediction in heterogeneous storage environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning SSD feature selection Storage Reliability failure prediction wear level WEFR

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.