Tagged articles
1 articles
Page 1 of 1
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 21, 2024 · Artificial Intelligence

Can the CaR Method Achieve Better LLM Performance with Only 1.4% of Training Data?

This article explains how the CaR (Clustering and Ranking) approach evaluates data quality with a scoring model and selects diverse samples via PCA‑reduced sentence embeddings and K‑Means clustering, achieving comparable or superior large‑model performance while using just 1.96% of the original dataset.

CaR methodData QualityLLM training
0 likes · 8 min read
Can the CaR Method Achieve Better LLM Performance with Only 1.4% of Training Data?