Artificial Intelligence 11 min read

Wear‑Updated Integrated Feature Ranking (WEFR) for SSD Failure Prediction

This article presents a large‑scale study of SSD failure prediction using SMART logs from Alibaba data centers, introduces the Wear‑Updated Integrated Feature Ranking (WEFR) method for robust feature selection across different drive models and wear levels, and demonstrates its effectiveness through extensive experiments.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Wear‑Updated Integrated Feature Ranking (WEFR) for SSD Failure Prediction

Building on a previous study of SSD operational characteristics, this work investigates SSD failure prediction for large‑scale deployments, emphasizing the need for reliable forecasting to avoid system‑wide outages.

The dataset comprises SMART logs and failure tickets collected from five Alibaba data centers over two years (2018‑01 to 2019‑12), covering six SSD models from three vendors (MA, MB, MC) and approximately 500 K drives, with around 7 K recorded failures.

SSD failure prediction is framed as an offline binary classification task: given raw and normalized SMART attributes as features, the model predicts whether a drive will fail within a future window (e.g., 30 days). Samples are generated daily, with healthy drives labeled 0 and failing drives labeled 1.

The study evaluates five state‑of‑the‑art feature‑selection techniques—Pearson correlation, Spearman correlation, J‑index, Random Forest importance, and XGBoost importance—highlighting their differing rankings of important attributes.

Feature‑importance analysis reveals that top and bottom ranked features vary across drive models and wear levels, and that trivial features (e.g., PSCN, PMSCR) can act as noise, underscoring the necessity of careful feature selection.

To address the challenges of robust and wear‑aware feature selection, the authors propose Wear‑Updated Integrated Feature Ranking (WEFR). WEFR aggregates rankings from multiple selection methods, removes outlier rankings, computes average ranks, automatically determines the number of features to keep, and updates the selected feature set when a change point in the wear‑level survival curve (MWIN) is detected.

Experimental results show that WEFR improves prediction accuracy (F0.5‑score) by up to 22 % compared with using all features, and that its automatic feature‑selection component consistently matches or exceeds the best manually tuned feature‑percentage across all six drive models.

Further, updating the selected features according to wear level yields additional gains (up to 13 % improvement for low‑wear SSDs), confirming that feature importance shifts with wear.

The study concludes that robust, wear‑aware feature selection is crucial for accurate SSD failure prediction in large‑scale storage systems.

machine learningSSDfeature selectionstorage reliabilityfailure predictionWear Level
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.