Artificial Intelligence 10 min read

Wear-Updated Integrated Feature Ranking (WEFR) for Robust SSD Failure Prediction

The article presents a large‑scale study of SSD failure prediction using SMART logs from multiple vendors, introduces the Wear‑Updated Integrated Feature Ranking (WEFR) method to automatically and robustly select predictive features, and demonstrates its effectiveness through extensive experiments on real‑world data.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Wear-Updated Integrated Feature Ranking (WEFR) for Robust SSD Failure Prediction

In a follow‑up to a previous study on SSD operational characteristics, this article focuses on predicting SSD failures, which are critical for large‑scale deployments because they can cause system‑wide outages.

The authors collected two years of SMART logs and failure tickets from five Alibaba data centers, covering six SSD models from three vendors (MA, MB, MC) and totaling nearly 500 K SSDs and 7 K failure records.

SSD failure prediction is framed as an offline classification problem: using raw and normalized SMART attributes as features to predict whether an SSD will fail within a future window (e.g., 30 days). Positive samples correspond to failing SSDs, negative samples to healthy ones.

The paper evaluates five state‑of‑the‑art feature‑selection methods—Pearson correlation, Spearman correlation, J‑index, Random Forest importance, and XGBoost importance—and observes that different methods rank features differently, raising questions about the most effective approach and the optimal number of features.

To address these challenges, the authors propose Wear‑Updated Integrated Feature Ranking (WEFR), which combines multiple feature‑ranking results, removes biased rankings, averages ranks to obtain a final order, automatically determines the number of selected features, and updates the selection according to changes in the wear‑level indicator (MWIN).

Experimental results show that WEFR improves prediction F0.5‑score by up to 22 % compared with using all features, outperforms each individual feature‑selection method, and benefits from automatic and wear‑aware updates, especially for SSDs in low‑wear stages.

The study confirms that robust feature selection and wear‑level‑aware updates are essential for accurate SSD failure prediction in heterogeneous storage environments.

machine learningSSDfeature selectionstorage reliabilityfailure predictionWear LevelWEFR
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.