Machine Learning Algorithms & Natural Language Processing
Mar 3, 2026 · Artificial Intelligence
Beyond Dense and MoE: JTok Module Cuts Compute by One‑Third as a New Scaling Path
The paper introduces JTok and its dynamic variant JTok‑M, a token‑indexed parameter scaling method that decouples model capacity from compute, achieving up to 35% compute reduction while delivering consistent performance gains across a wide range of downstream tasks and model sizes.
Compute EfficiencyJTokToken-indexed scaling
0 likes · 16 min read
