Baobao Algorithm Notes
Mar 21, 2024 · Artificial Intelligence
Can the CaR Method Achieve Better LLM Performance with Only 1.4% of Training Data?
This article explains how the CaR (Clustering and Ranking) approach evaluates data quality with a scoring model and selects diverse samples via PCA‑reduced sentence embeddings and K‑Means clustering, achieving comparable or superior large‑model performance while using just 1.96% of the original dataset.
CaR methodData QualityLLM training
0 likes · 8 min read
