Tagged articles
19 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 22, 2026 · Artificial Intelligence

Turning Transformers into Mamba: A Cross‑Architecture Distillation That Linearizes Inference Cost

The article presents a two‑step cross‑architecture distillation method that replaces the quadratic softmax attention of Transformers with a learned linear attention and then maps it onto a Mamba backbone, achieving near‑teacher performance while reducing inference cost to linear time.

Cross‑ArchitectureDistillationLinear Attention
0 likes · 8 min read
Turning Transformers into Mamba: A Cross‑Architecture Distillation That Linearizes Inference Cost
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 19, 2026 · Artificial Intelligence

Qwen3.6-35B: 4‑bit Quantization, DFlash Speedup, Claude Opus Distillation

The article reviews three optimization paths for the Qwen3.6‑35B model—four‑bit AWQ quantization variants, the DFlash speculative decoding accelerator, and a Claude Opus‑based distillation—detailing their implementation steps, benchmark results, and guidance on selecting the best version for different hardware and performance needs.

AIDFlashDistillation
0 likes · 11 min read
Qwen3.6-35B: 4‑bit Quantization, DFlash Speedup, Claude Opus Distillation
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 3, 2026 · Artificial Intelligence

Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent

The newly released Qwopus3.5‑v3 model combines higher‑quality reasoning chains, dedicated tool‑calling reinforcement learning, and an act‑then‑refine paradigm, delivering a 5‑point HumanEval boost, a 1.43‑point MMLU‑Pro gain, 31.7% faster inference and 24% lower token cost, while remaining runnable on a 3090 or a 16 GB MacBook, with easy deployment via GGUF, LM Studio, Ollama or llama.cpp.

Claude OpusDistillationHumanEval
0 likes · 12 min read
Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent
AI Engineer Programming
AI Engineer Programming
Mar 28, 2026 · Artificial Intelligence

How to Start Training Your Own AI Model: A Complete Roadmap

This guide maps the end-to-end process for building a small AI model—from leveraging open-source base models and applying SFT with LoRA/QLoRA, through alignment techniques like DPO or ORPO, to low-cost distillation and final quantization for local deployment, while recommending free GPU resources and essential tooling.

AIAlignmentDistillation
0 likes · 12 min read
How to Start Training Your Own AI Model: A Complete Roadmap
Amap Tech
Amap Tech
Jan 8, 2026 · Artificial Intelligence

How AI Powers Fancy Video Generation for Real‑World POI Scenes

This article details the AI techniques behind Gaode's "Street Ranking" project, explaining the Fancy video concept, the dual training and production pipelines, and the use of SFT, reinforcement learning, MoE‑LoRA, distribution‑matching distillation, and quality‑filtering to achieve 25× faster generation with high aesthetic fidelity.

AI video generationDistillationmodel fine-tuning
0 likes · 24 min read
How AI Powers Fancy Video Generation for Real‑World POI Scenes
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Nov 4, 2025 · Artificial Intelligence

Unveiling the Law of Capacity Gap: Boosting Language Model Distillation Efficiency

At ACL 2025, a collaborative paper introduced the Law of Capacity Gap, revealing a linear 2.5× optimal teacher‑student size relationship in language model distillation, dramatically cutting compute costs and achieving Pareto‑optimal efficiency, with the MiniMA model as a successful demonstration.

DistillationMiniMAartificial-intelligence
0 likes · 7 min read
Unveiling the Law of Capacity Gap: Boosting Language Model Distillation Efficiency
Baidu Geek Talk
Baidu Geek Talk
Aug 11, 2025 · Artificial Intelligence

FLUX-Lightning Slashes Diffusion Inference to 4 Steps, Doubling Speed

FLUX-Lightning, introduced by PaddleMIX, combines phased consistency distillation, adversarial learning, distribution‑matching distillation, and reflow loss to reduce diffusion model inference to just four steps while preserving image quality, and leverages the CINN compiler to achieve over 30% speed gains on A800 GPUs, surpassing existing SOTA acceleration methods.

AI inferenceCINNDistillation
0 likes · 21 min read
FLUX-Lightning Slashes Diffusion Inference to 4 Steps, Doubling Speed
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 30, 2025 · Artificial Intelligence

Unlocking Small LLM Power: Variable‑Length Chain Distillation with DistillQwen‑ThoughtY

This article introduces a variable‑length chain‑of‑thought distillation technique built on Alibaba Cloud PAI’s EasyDistill toolkit, presents the high‑quality OmniThought‑0528 dataset, details the training of the DistillQwen‑ThoughtY 4B/8B/32B models, and provides code and usage examples for researchers and practitioners.

DatasetDistillationLLM
0 likes · 15 min read
Unlocking Small LLM Power: Variable‑Length Chain Distillation with DistillQwen‑ThoughtY
AI Frontier Lectures
AI Frontier Lectures
May 30, 2025 · Artificial Intelligence

Can a 5% Parameter LLM Rival Full‑Scale Models? Inside FairyR1‑32B

The Beijing University team unveils FairyR1‑32B, a 32‑billion‑parameter LLM built on DeepSeek‑R1‑Distill‑Qwen‑32B that uses self‑merging, multi‑teacher cross‑distillation, and lightweight distillation to achieve competitive math and code benchmark scores with only about 5% of the original model’s parameters.

Distillationlarge language modelmodel compression
0 likes · 6 min read
Can a 5% Parameter LLM Rival Full‑Scale Models? Inside FairyR1‑32B
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
May 29, 2025 · Artificial Intelligence

How OmniThought Enables Adaptive Reasoning Chains for Better LLM Performance

This article introduces the OmniThought dataset, which annotates over two million chain‑of‑thought reasoning steps with Reasoning Verbosity and Cognitive Difficulty scores, and explains how these metrics guide the training of DistilQwen‑ThoughtX models that adapt chain length to task difficulty, achieving superior performance compared to existing distilled LLMs.

CoTDatasetDistillation
0 likes · 16 min read
How OmniThought Enables Adaptive Reasoning Chains for Better LLM Performance
AI Frontier Lectures
AI Frontier Lectures
Apr 27, 2025 · Artificial Intelligence

How Jeff Dean’s Vision Shaped Modern AI: From Neural Nets to Gemini

Jeff Dean’s 2024 ETH Zurich talk traces fifteen years of AI breakthroughs—from the rise of neural networks and back‑propagation, through large‑scale distributed training, TPUs, Transformers, sparse MoE models, and advanced prompting techniques—showing how scaling compute, data, and clever software have driven today’s powerful Gemini models.

AIChain-of-ThoughtDistillation
0 likes · 18 min read
How Jeff Dean’s Vision Shaped Modern AI: From Neural Nets to Gemini
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 6, 2024 · Artificial Intelligence

Unlocking Long-Text Video Understanding and LLM Distillation with Alibaba PAI

Alibaba Cloud’s AI platform PAI recently saw two papers accepted at EMNLP2024—VideoCLIP‑XL, which enhances video‑text representation for long descriptions using a large video‑long‑description dataset and novel pre‑training tasks, and TAPIR, a curriculum‑planning framework that distills instruction‑following abilities of large language models—while also releasing associated models, datasets, and integration tools for users.

DistillationEMNLP2024large-language-models
0 likes · 8 min read
Unlocking Long-Text Video Understanding and LLM Distillation with Alibaba PAI
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Sep 19, 2024 · Artificial Intelligence

Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models

Target‑Driven Distillation (TDD) is a multi‑goal distillation method that flexibly selects short‑range target steps and decouples guidance during training, enabling 4‑to‑8‑step diffusion generation that preserves high‑resolution detail, works with LoRA, ControlNet, InstantID, and outperforms existing consistency distillation techniques in speed and quality.

AI accelerationDistillationdiffusion models
0 likes · 9 min read
Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models
Sohu Tech Products
Sohu Tech Products
May 21, 2024 · Artificial Intelligence

OPPO Multimodal Pretrained Model Deployment in Cloud-Edge Scenarios: Practices and Optimizations

OPPO details how it deploys multimodal pretrained models on resource‑constrained edge devices by compressing CLIP‑based image‑text retrieval, adapting Chinese text‑to‑image generation with LoRA and adapters, and lightweighting diffusion models through layer pruning and progressive distillation, achieving sub‑3‑second generation while preserving cloud‑level quality.

CLIPDistillationLoRA
0 likes · 18 min read
OPPO Multimodal Pretrained Model Deployment in Cloud-Edge Scenarios: Practices and Optimizations
DataFunTalk
DataFunTalk
Apr 25, 2023 · Artificial Intelligence

DAMO-YOLO: An Efficient Target Detection Framework with NAS, Multi‑Scale Fusion, and Full‑Scale Distillation

This article introduces DAMO‑YOLO, a high‑performance object detection framework that combines low‑cost model customization via MAE‑NAS, an Efficient RepGFPN with HeavyNeck for superior multi‑scale detection, and a full‑scale distillation technique, delivering faster inference, lower FLOPs, and higher accuracy across diverse industrial scenarios.

DistillationModel OptimizationNAS
0 likes · 15 min read
DAMO-YOLO: An Efficient Target Detection Framework with NAS, Multi‑Scale Fusion, and Full‑Scale Distillation
Meituan Technology Team
Meituan Technology Team
Sep 24, 2020 · Artificial Intelligence

Meituan Search Ads Team's Solution for KDD Cup 2020 Multimodalities Recall Track

Meituan’s Search Ads team placed third in the KDD Cup 2020 Multimodalities Recall track by tackling training‑test distribution mismatch with diversified negative sampling and distillation learning, and improving text‑image matching via gated fully‑connected layers, bidirectional attention, and diversified fusion, then ensembling neural and tree models for strong NDCG gains later applied to their ad creative‑selection system.

DistillationKDD CupMultimodal Learning
0 likes · 19 min read
Meituan Search Ads Team's Solution for KDD Cup 2020 Multimodalities Recall Track