Artificial Intelligence 11 min read

Dynamic Difficulty-Adaptive Training Gains Momentum: Huawei’s EDCO Cited at ICML 2026

EDCO, Huawei’s entropy‑based dynamic curriculum method, continuously selects the most uncertain samples for domain‑specific LLM fine‑tuning, achieving higher accuracy and more stable gradients across communication, medical, and legal tasks while cutting entropy‑estimation cost by over 80 %.

Machine Heart

May 18, 2026

Dynamic Difficulty-Adaptive Training Gains Momentum: Huawei’s EDCO Cited at ICML 2026

Background

In domain‑specific LLM fine‑tuning, data scarcity and high acquisition cost make indiscriminate scaling ineffective. Traditional static curricula (easy‑to‑hard) or random sampling ignore the model’s evolving competence.

EDCO Overview

Huawei GTS’s AI Data team proposes Entropy‑based Dynamic Curriculum Orchestration (EDCO). The method estimates the inference entropy of each training sample with the current model, selects the highest‑entropy samples as the next curriculum, and repeats the loop.

Key components:

Entropy as difficulty signal – higher inference entropy indicates the model is uncertain, thus the sample provides stronger learning signal.

Prefix‑entropy approximation – quick‑answer prompting followed by computing conditional entropy on the first few tokens reduces per‑sample cost from 2.24 s to 0.37 s (≈ 83.5 % saving).

Dynamic top‑N selection – at each interval the top‑N high‑entropy samples are re‑estimated and form the next training batch.

Experimental Setup

Experiments cover three domains (communication, medical, legal) using two backbone models (Qwen‑3‑4B and Llama‑3.2‑3B) and two fine‑tuning paradigms (SFT and RLFT). In the communication domain, two tasks are defined: Wireless (network‑optimization) and Datacom (multi‑vendor log analysis).

Results

RLFT on Datacom: EDCO achieves 46.96 % accuracy, surpassing random sampling (40.43 %) and PPL‑based curriculum (44.78 %). Wireless: 38.70 % vs lower baselines.

SFT results: Wireless 33.7 %, Datacom 36.3 %; MedQA 36.7 %; JEC‑QA 17.4 % – all highest among compared methods.

Compared with Dynamic‑PPL and SEC baselines on Datacom, EDCO reaches 47.0 % vs 41.3 % and 34.78 %, highlighting the importance of the entropy signal.

Gradient analysis on MedQA (Qwen‑3‑1.7B) shows EDCO’s selected batches have gradient direction consistency 0.92 (vs 0.82 random) and average inference entropy 1.51 (vs 1.23), while RL gradient norm is 3.77 (vs 2.62), indicating stronger, less conflicting learning signals.

Mechanism Insight

EDCO maintains higher inference entropy throughout training, preventing premature confidence collapse seen with static curricula. Sample turnover analysis shows ~3000 new samples enter the curriculum after the first interval, with continual addition of previously unseen high‑entropy samples and retention of lingering difficult examples.

Efficiency

Prefix‑entropy estimation reduces per‑sample cost by 83.5 %; on 8 GPUs the time drops to 0.04 s, making the dynamic curriculum practical for large pools.

Conclusion

EDCO demonstrates that data value is a function of the model’s current state. By driving curriculum with inference entropy and keeping overhead low, it improves fine‑tuning performance across multiple domains without altering model architecture or training objectives, and works with both SFT and RLFT.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SFT domain-specific LLM dynamic curriculum EDCO inference entropy large language model fine-tuning RLFT

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.