Tagged articles

Distillation

21 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

Jun 5, 2026 · Artificial Intelligence

StepOPSD: Precise Step‑Level Error Detection for Multi‑Turn Agent RL

StepOPSD adds a post‑hoc, step‑aware distillation stage to multi‑turn agent reinforcement learning, splitting rollouts into controllable steps, using successful trajectories as hindsight teachers to compute token‑level advantage adjustments, and demonstrating significant gains on ALFWorld and Search‑QA tasks where reward misalignment is most severe.

ALFWorldAdvantage WeightingAgent RL

0 likes · 13 min read

StepOPSD: Precise Step‑Level Error Detection for Multi‑Turn Agent RL

Meituan Technology Team

May 22, 2026 · Artificial Intelligence

From High-Fidelity to Real-World Use: LongCat Video Avatar 1.5 Open‑Source Release

LongCat Video Avatar 1.5 is now open‑source, delivering commercial‑grade lip sync, physical realism, long‑video stability, multi‑person interaction and 15× faster inference through Whisper‑large audio encoding, DMD 8‑step distillation and LoRA adapters, and it outperforms leading closed‑source models in extensive human‑rated benchmarks.

AIDistillationLongCat-Video-Avatar

0 likes · 9 min read

Machine Learning Algorithms & Natural Language Processing

Apr 22, 2026 · Artificial Intelligence

Turning Transformers into Mamba: A Cross‑Architecture Distillation That Linearizes Inference Cost

The article presents a two‑step cross‑architecture distillation method that replaces the quadratic softmax attention of Transformers with a learned linear attention and then maps it onto a Mamba backbone, achieving near‑teacher performance while reducing inference cost to linear time.

Cross‑ArchitectureDistillationLinear Attention

0 likes · 8 min read

Turning Transformers into Mamba: A Cross‑Architecture Distillation That Linearizes Inference Cost

Old Zhang's AI Learning

Apr 19, 2026 · Artificial Intelligence

Qwen3.6-35B: 4‑bit Quantization, DFlash Speedup, Claude Opus Distillation

The article reviews three optimization paths for the Qwen3.6‑35B model—four‑bit AWQ quantization variants, the DFlash speculative decoding accelerator, and a Claude Opus‑based distillation—detailing their implementation steps, benchmark results, and guidance on selecting the best version for different hardware and performance needs.

AIDFlashDistillation

0 likes · 11 min read

Qwen3.6-35B: 4‑bit Quantization, DFlash Speedup, Claude Opus Distillation

Old Zhang's AI Learning

Apr 3, 2026 · Artificial Intelligence

Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent

The newly released Qwopus3.5‑v3 model combines higher‑quality reasoning chains, dedicated tool‑calling reinforcement learning, and an act‑then‑refine paradigm, delivering a 5‑point HumanEval boost, a 1.43‑point MMLU‑Pro gain, 31.7% faster inference and 24% lower token cost, while remaining runnable on a 3090 or a 16 GB MacBook, with easy deployment via GGUF, LM Studio, Ollama or llama.cpp.

Claude OpusDistillationHumanEval

0 likes · 12 min read

Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent

AI Engineer Programming

Mar 28, 2026 · Artificial Intelligence

How to Start Training Your Own AI Model: A Complete Roadmap

This guide maps the end-to-end process for building a small AI model—from leveraging open-source base models and applying SFT with LoRA/QLoRA, through alignment techniques like DPO or ORPO, to low-cost distillation and final quantization for local deployment, while recommending free GPU resources and essential tooling.

AIDistillationLoRA

0 likes · 12 min read

How to Start Training Your Own AI Model: A Complete Roadmap

Old Zhang's AI Learning

Mar 25, 2026 · Artificial Intelligence

Claude‑Opus‑4.6 Distilled Qwen3.5 v2: Faster Reasoning with Same Code Accuracy

The new Claude‑Opus‑4.6 distilled Qwen3.5‑v2 keeps code‑generation accuracy while cutting reasoning length by 24% and boosting per‑token correctness by 31.6%, offering a noticeable speed and cost advantage for local LLM deployment despite a 7.2% drop on MMLU‑Pro.

Claude OpusDistillationLocal LLM Deployment

0 likes · 7 min read

Claude‑Opus‑4.6 Distilled Qwen3.5 v2: Faster Reasoning with Same Code Accuracy

Amap Tech

Jan 8, 2026 · Artificial Intelligence

How AI Powers Fancy Video Generation for Real‑World POI Scenes

This article details the AI techniques behind Gaode's "Street Ranking" project, explaining the Fancy video concept, the dual training and production pipelines, and the use of SFT, reinforcement learning, MoE‑LoRA, distribution‑matching distillation, and quality‑filtering to achieve 25× faster generation with high aesthetic fidelity.

AI video generationDistillationMultimodal

0 likes · 24 min read

How AI Powers Fancy Video Generation for Real‑World POI Scenes

Xiaohongshu Tech REDtech

Nov 4, 2025 · Artificial Intelligence

Unveiling the Law of Capacity Gap: Boosting Language Model Distillation Efficiency

At ACL 2025, a collaborative paper introduced the Law of Capacity Gap, revealing a linear 2.5× optimal teacher‑student size relationship in language model distillation, dramatically cutting compute costs and achieving Pareto‑optimal efficiency, with the MiniMA model as a successful demonstration.

DistillationLanguage ModelsMiniMA

0 likes · 7 min read

Unveiling the Law of Capacity Gap: Boosting Language Model Distillation Efficiency

Baidu Geek Talk

Aug 11, 2025 · Artificial Intelligence

FLUX-Lightning Slashes Diffusion Inference to 4 Steps, Doubling Speed

FLUX-Lightning, introduced by PaddleMIX, combines phased consistency distillation, adversarial learning, distribution‑matching distillation, and reflow loss to reduce diffusion model inference to just four steps while preserving image quality, and leverages the CINN compiler to achieve over 30% speed gains on A800 GPUs, surpassing existing SOTA acceleration methods.

AI inferenceCINNDiffusion Models

0 likes · 21 min read

FLUX-Lightning Slashes Diffusion Inference to 4 Steps, Doubling Speed

Alibaba Cloud Big Data AI Platform

Jun 30, 2025 · Artificial Intelligence

Unlocking Small LLM Power: Variable‑Length Chain Distillation with DistillQwen‑ThoughtY

This article introduces a variable‑length chain‑of‑thought distillation technique built on Alibaba Cloud PAI’s EasyDistill toolkit, presents the high‑quality OmniThought‑0528 dataset, details the training of the DistillQwen‑ThoughtY 4B/8B/32B models, and provides code and usage examples for researchers and practitioners.

Chain-of-ThoughtDistillationLLM

0 likes · 15 min read

Unlocking Small LLM Power: Variable‑Length Chain Distillation with DistillQwen‑ThoughtY

AI Frontier Lectures

May 30, 2025 · Artificial Intelligence

Can a 5% Parameter LLM Rival Full‑Scale Models? Inside FairyR1‑32B

The Beijing University team unveils FairyR1‑32B, a 32‑billion‑parameter LLM built on DeepSeek‑R1‑Distill‑Qwen‑32B that uses self‑merging, multi‑teacher cross‑distillation, and lightweight distillation to achieve competitive math and code benchmark scores with only about 5% of the original model’s parameters.

DistillationLarge Language Modelmodel compression

0 likes · 6 min read

Can a 5% Parameter LLM Rival Full‑Scale Models? Inside FairyR1‑32B

Alibaba Cloud Big Data AI Platform

May 29, 2025 · Artificial Intelligence

How OmniThought Enables Adaptive Reasoning Chains for Better LLM Performance

This article introduces the OmniThought dataset, which annotates over two million chain‑of‑thought reasoning steps with Reasoning Verbosity and Cognitive Difficulty scores, and explains how these metrics guide the training of DistilQwen‑ThoughtX models that adapt chain length to task difficulty, achieving superior performance compared to existing distilled LLMs.

CoTDistillationLLM

0 likes · 16 min read

How OmniThought Enables Adaptive Reasoning Chains for Better LLM Performance

Baobao Algorithm Notes

May 13, 2025 · Artificial Intelligence

How Qwen3 Achieves Multi-Stage Pretraining, Long-Context, and Thought-Controlled RL

The article details Qwen3's three‑phase pretraining pipeline, long‑context extensions, a cold‑start long‑chain‑of‑thought dataset, reinforcement‑learning fine‑tuning with custom rewards, and a two‑stage distillation process that yields versatile, thought‑controlled language models.

DistillationQwen3long-context

0 likes · 15 min read

How Qwen3 Achieves Multi-Stage Pretraining, Long-Context, and Thought-Controlled RL

AI Frontier Lectures

Apr 27, 2025 · Artificial Intelligence

How Jeff Dean’s Vision Shaped Modern AI: From Neural Nets to Gemini

Jeff Dean’s 2024 ETH Zurich talk traces fifteen years of AI breakthroughs—from the rise of neural networks and back‑propagation, through large‑scale distributed training, TPUs, Transformers, sparse MoE models, and advanced prompting techniques—showing how scaling compute, data, and clever software have driven today’s powerful Gemini models.

AIChain-of-ThoughtDistillation

0 likes · 18 min read

How Jeff Dean’s Vision Shaped Modern AI: From Neural Nets to Gemini

Alibaba Cloud Big Data AI Platform

Nov 6, 2024 · Artificial Intelligence

Unlocking Long-Text Video Understanding and LLM Distillation with Alibaba PAI

Alibaba Cloud’s AI platform PAI recently saw two papers accepted at EMNLP2024—VideoCLIP‑XL, which enhances video‑text representation for long descriptions using a large video‑long‑description dataset and novel pre‑training tasks, and TAPIR, a curriculum‑planning framework that distills instruction‑following abilities of large language models—while also releasing associated models, datasets, and integration tools for users.

DistillationEMNLP2024Multimodal

0 likes · 8 min read

Unlocking Long-Text Video Understanding and LLM Distillation with Alibaba PAI

Xiaohongshu Tech REDtech

Sep 19, 2024 · Artificial Intelligence

Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models

Target‑Driven Distillation (TDD) is a multi‑goal distillation method that flexibly selects short‑range target steps and decouples guidance during training, enabling 4‑to‑8‑step diffusion generation that preserves high‑resolution detail, works with LoRA, ControlNet, InstantID, and outperforms existing consistency distillation techniques in speed and quality.

AI accelerationDiffusion ModelsDistillation

0 likes · 9 min read

Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models

Alibaba Cloud Big Data AI Platform

Aug 30, 2024 · Artificial Intelligence

Boost LLM Performance: Data Augmentation & Distillation with Qwen2

This guide explains how to reduce the computational cost of large language models by preparing instruction data, optionally augmenting or refining it, deploying teacher and student models on PAI, and performing distillation training with detailed hyper‑parameter settings and sample Python scripts.

AIData AugmentationDistillation

0 likes · 21 min read

Boost LLM Performance: Data Augmentation & Distillation with Qwen2

Sohu Tech Products

May 21, 2024 · Artificial Intelligence

OPPO Multimodal Pretrained Model Deployment in Cloud-Edge Scenarios: Practices and Optimizations

OPPO details how it deploys multimodal pretrained models on resource‑constrained edge devices by compressing CLIP‑based image‑text retrieval, adapting Chinese text‑to‑image generation with LoRA and adapters, and lightweighting diffusion models through layer pruning and progressive distillation, achieving sub‑3‑second generation while preserving cloud‑level quality.

CLIPDistillationEdge deployment

0 likes · 18 min read

OPPO Multimodal Pretrained Model Deployment in Cloud-Edge Scenarios: Practices and Optimizations

DataFunTalk

Apr 25, 2023 · Artificial Intelligence

DAMO-YOLO: An Efficient Target Detection Framework with NAS, Multi‑Scale Fusion, and Full‑Scale Distillation

This article introduces DAMO‑YOLO, a high‑performance object detection framework that combines low‑cost model customization via MAE‑NAS, an Efficient RepGFPN with HeavyNeck for superior multi‑scale detection, and a full‑scale distillation technique, delivering faster inference, lower FLOPs, and higher accuracy across diverse industrial scenarios.

DistillationModel OptimizationNAS

0 likes · 15 min read

DAMO-YOLO: An Efficient Target Detection Framework with NAS, Multi‑Scale Fusion, and Full‑Scale Distillation

Meituan Technology Team

Sep 24, 2020 · Artificial Intelligence

Meituan Search Ads Team's Solution for KDD Cup 2020 Multimodalities Recall Track

Meituan’s Search Ads team placed third in the KDD Cup 2020 Multimodalities Recall track by tackling training‑test distribution mismatch with diversified negative sampling and distillation learning, and improving text‑image matching via gated fully‑connected layers, bidirectional attention, and diversified fusion, then ensembling neural and tree models for strong NDCG gains later applied to their ad creative‑selection system.

DistillationInformation RetrievalKDD Cup

0 likes · 19 min read