parameter efficiency — 6 Technical Articles

Apr 23, 2026 · Artificial Intelligence

Task Tokens Cut Per-Task Trainable Parameters 125× and Boost Convergence 6× for Embodied AI

The Task Tokens method introduced by an Israeli research team reduces the number of trainable parameters per task by up to 125‑fold and speeds up convergence by six times, while preserving the flexibility of Behavior Foundation Models and demonstrating strong performance, robustness, and compatibility across a suite of embodied control tasks.

Behavior Foundation ModelsMulti-Modal PromptingPPO

0 likes · 13 min read

Task Tokens Cut Per-Task Trainable Parameters 125× and Boost Convergence 6× for Embodied AI

Machine Learning Algorithms & Natural Language Processing

Apr 8, 2026 · Artificial Intelligence

Dissecting Gemma‑4’s Architecture and Training Choices: A Technical Comparison with Qwen‑3 and GLM‑5

This article breaks down every architectural and training decision behind Gemma‑4—KV sharing, p‑RoPE, per‑layer embeddings, and a dual‑path MoE + dense MLP—while contrasting its efficiency and performance with Qwen‑3 and GLM‑5 across benchmarks, quantization strategies, and RL pipelines.

GLM-5Gemma 4LLM architecture

0 likes · 23 min read

Dissecting Gemma‑4’s Architecture and Training Choices: A Technical Comparison with Qwen‑3 and GLM‑5

AIWalker

Mar 20, 2026 · Artificial Intelligence

Plug‑and‑Play reAR Boosts Visual AR to SOTA Quality with Only 177M Parameters

The paper introduces reAR, a plug‑and‑play regularization framework that aligns generator and tokenizer representations in visual autoregressive models, dramatically improving image quality and matching large diffusion models while using far fewer parameters, and validates the approach with extensive experiments, ablations, and scalability analysis.

AI researchImage Generationparameter efficiency

0 likes · 20 min read

Plug‑and‑Play reAR Boosts Visual AR to SOTA Quality with Only 177M Parameters

Data Party THU

Mar 6, 2026 · Artificial Intelligence

How Small Can a Transformer Get? Inside the 121‑Parameter AdderBoard Challenge

This article chronicles the AdderBoard competition, detailing how researchers compressed a Transformer for 10‑digit addition down to just 121 parameters, the experimental rules, the contrasting hand‑coded and data‑driven approaches, and the insights gained about model minimalism and discoverability.

AdderBoardTransformermodel compression

0 likes · 13 min read

How Small Can a Transformer Get? Inside the 121‑Parameter AdderBoard Challenge

Data Party THU

Jan 19, 2026 · Artificial Intelligence

How VersatileFFN Cuts Memory Use While Boosting LLM Performance

The article introduces Huawei's VersatileFFN, an adaptive wide‑and‑deep feed‑forward design for large language models that reuses parameters to slash memory consumption while delivering stronger inference, detailing its dual‑system inspiration, technical mechanisms, experimental gains, and implications for efficient LLM deployment.

Adaptive ComputationLLMTransformer

0 likes · 8 min read

How VersatileFFN Cuts Memory Use While Boosting LLM Performance

Alibaba Cloud Big Data AI Platform

Jul 25, 2022 · Artificial Intelligence

Cut LLM Fine‑Tuning Cost to 1.5% Parameters with PST Sparsity

The article introduces Alibaba Cloud’s PST algorithm, a parameter‑efficient sparsity method that combines data‑free and data‑driven importance metrics to achieve low‑rank and structured sparsity, enabling large language models to be fine‑tuned with only 1.5% of parameters while maintaining comparable accuracy.

AIPST algorithmmodel compression

0 likes · 8 min read

Cut LLM Fine‑Tuning Cost to 1.5% Parameters with PST Sparsity