FSL++: A Few-Shot Learning Model for Chinese Language Understanding that Tops the FewCLUE Benchmark
FSL++—a RoBERTa‑large‑based few‑shot model enhanced with domain‑adaptive pre‑training, prompt learning, diverse embedding‑level augmentations, and ensemble self‑training—topped the Chinese FewCLUE benchmark, beating human accuracy on news and scientific classification tasks and delivering measurable gains across multiple Meituan product scenarios.
Recently, Meituan Search and the NLP Center’s semantic understanding team achieved the top rank on the Chinese few‑shot language understanding benchmark FewCLUE with their model FSL++. The model obtained first place on the natural language inference (OCNLI) task and surpassed human accuracy on news classification (TNEWS) and scientific literature classification (CSLDCP) even with only about 100 labeled samples per class.
1. Overview – CLUE (Chinese Language Understanding Evaluation) is a widely used benchmark for Chinese NLP tasks. FewCLUE is a sub‑benchmark focusing on few‑shot learning, containing nine tasks such as text classification, sentence‑pair inference, and reading comprehension. FSL++ was submitted to FewCLUE and achieved state‑of‑the‑art (SOTA) performance.
2. Methodology
2.1 Enhanced Pre‑training – The base model is RoBERTa‑large, further pre‑trained with domain‑adaptive pre‑training (DAPT) on 100 GB of domain‑specific corpora and task‑adaptive pre‑training (TAPT) on the unlabeled data provided by FewCLUE.
2.2 Model Architecture – For classification and reading‑comprehension tasks, PET (Prompt‑based finetuning) is used; for sentence‑pair tasks, EFL (Entailment as Few‑shot Learner) is employed. Prompt Learning converts downstream tasks into a masked‑language‑model format by designing templates and verbalizers.
2.3 Data Augmentation – Embedding‑level augmentation strategies are applied, including Mixup, Manifold‑Mixup, adversarial training (AT), and R‑Drop (contrastive dropout). These augmentations are performed on the model’s output embeddings or hidden layers.
2.4 Ensemble & Self‑Training – Multiple weak models trained with different augmentation strategies are ensembled. Pseudo‑labels are generated for the large unlabeled set, and a student model is trained on the combined labeled and high‑confidence pseudo‑labeled data.
3. Experimental Results
3.1 Datasets – FewCLUE provides ~160 labeled samples and ~20 k unlabeled samples per task. The nine tasks cover four text‑classification, two sentence‑pair inference, and three reading‑comprehension tasks.
3.2 Comparisons – RoBERTa‑Base with PET/EFL already outperforms vanilla fine‑tuning by 2–28 pp. Scaling to RoBERTa‑Large yields additional gains of 0.5–13 pp. Further domain‑adapted pre‑training (RoBERTa‑Large‑Clue) adds 0.1–9 pp. Adding data‑augmentation improves performance by 0.8–9 pp, and ensemble + self‑training adds another 0.4–4 pp.
4. Application in Meituan Scenarios
The few‑shot strategies are deployed in various Meituan products, such as medical‑beauty content classification, travel‑guide detection, internal knowledge‑base categorization, and brand labeling, achieving accuracy improvements of 1.5–4 pp with only a few hundred labeled examples.
5. Conclusion
FSL++ combines RoBERTa‑large, domain‑adapted pre‑training, Prompt Learning, diverse augmentation, and ensemble/self‑training to deliver strong performance on Chinese few‑shot NLP tasks, surpassing human baselines on several benchmarks and proving effective in real‑world Meituan applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
