Tagged articles
121 articles
Page 1 of 2
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 14, 2026 · Artificial Intelligence

Turning Multi‑Teacher Conflict into Dynamic Constraints: Robust Reasoning Alignment for Multimodal LLMs (ICML 2026)

APO (Autonomous Preference Optimization) converts the drift and conflict among multiple teacher multimodal LLMs into dynamic negative constraints while treating consensus as a positive preference, enabling robust concept alignment and superior diagnostic accuracy on the CXR‑MAX benchmark, as demonstrated by extensive ICML‑2026 experiments.

APOICML 2026concept drift
0 likes · 11 min read
Turning Multi‑Teacher Conflict into Dynamic Constraints: Robust Reasoning Alignment for Multimodal LLMs (ICML 2026)
Machine Heart
Machine Heart
May 13, 2026 · Artificial Intelligence

Turning Multi-Teacher Conflict into Dynamic Constraints for Precise Multimodal Model Alignment (ICML 2026)

The paper introduces APO, a novel autonomous preference optimization framework that converts concept drift among multiple teacher multimodal LLMs into dynamic negative constraints and treats consensus as a positive preference, achieving robust concept alignment and surpassing strong teachers on a high‑risk medical X‑ray benchmark.

APOCXR-MAXICML 2026
0 likes · 11 min read
Turning Multi-Teacher Conflict into Dynamic Constraints for Precise Multimodal Model Alignment (ICML 2026)
Liangxu Linux
Liangxu Linux
May 12, 2026 · Artificial Intelligence

How to Deploy Trained Neural Networks on Arduino and Raspberry Pi

Deploying large AI models to tiny embedded devices like Arduino and Raspberry Pi requires aggressive model slimming through quantization, pruning, and distillation, careful selection of runtimes such as TensorFlow Lite, and addressing power, latency, and debugging challenges to achieve real‑time inference.

ArduinoEmbedded AIModel Pruning
0 likes · 7 min read
How to Deploy Trained Neural Networks on Arduino and Raspberry Pi
CodeTrend
CodeTrend
Apr 24, 2026 · Artificial Intelligence

How Large Language Models Acquire Tool‑Calling Ability: SFT, RLHF & LoRA Explained

The article explains why pretrained LLMs cannot call tools, then breaks down the three‑stage training pipeline—Supervised Fine‑Tuning, Reinforcement Learning from Human Feedback, and knowledge distillation—showing how each step teaches models to read tool schemas, decide when to invoke a tool, generate JSON calls, and finally transfer the capability to smaller models with LoRA.

AI trainingFunction CallingLLM
0 likes · 19 min read
How Large Language Models Acquire Tool‑Calling Ability: SFT, RLHF & LoRA Explained
Frontend AI Walk
Frontend AI Walk
Apr 21, 2026 · Artificial Intelligence

How to Distill Any Expert into an AI Skill: Elon Musk SOP Guide

This article walks you through a complete knowledge‑distillation workflow that turns Elon Musk’s decision‑making logic into a reusable AI skill, covering source collection, Obsidian setup, a six‑step prompting chain, adding personal commentary, and packaging the result for manual or automated AI use.

AI workflowClaudeElon Musk
0 likes · 21 min read
How to Distill Any Expert into an AI Skill: Elon Musk SOP Guide
Architect's Ambition
Architect's Ambition
Apr 20, 2026 · Artificial Intelligence

How to Turn GitHub‑Trending AI Skills into Real‑World Agents with Knowledge Distillation

The article explains why generic AI is insufficient, defines a Skill as the minimal unit of specialized AI, and details a three‑layer knowledge‑distillation methodology—knowledge, logic, style—to build practical person‑ and book‑based AI Skills, illustrated with a complete Wang Yangming Skill implementation and common pitfalls.

AI SkillPrompt engineeringagent development
0 likes · 12 min read
How to Turn GitHub‑Trending AI Skills into Real‑World Agents with Knowledge Distillation
Architect's Journey
Architect's Journey
Apr 16, 2026 · Industry Insights

What Really Scares Us When AI Starts “Distilling” Employees?

The article examines the hype around AI tools that turn employees' chats, documents, and emails into a compact “.skill” file, arguing that this so‑called “distillation” is merely knowledge capture, while true value—judgment, responsibility, and intuition—remains uncapturable and the anxiety surrounding it is deliberately manufactured.

AIDigital Twinemployee anxiety
0 likes · 6 min read
What Really Scares Us When AI Starts “Distilling” Employees?
Geek Labs
Geek Labs
Apr 9, 2026 · Artificial Intelligence

Digital Survival Guide: Cyber Immortality and Anti‑Distillation

The article reviews three open‑source GitHub projects—nuwa.skill, yourself‑skill, and anti‑distill—that aim to distill a person’s thinking into AI‑driven digital twins, explore practical examples, and discuss how such tools can preserve personal knowledge while preventing corporate exploitation.

AI Skillanti-distilldigital immortality
0 likes · 5 min read
Digital Survival Guide: Cyber Immortality and Anti‑Distillation
PaperAgent
PaperAgent
Apr 6, 2026 · Artificial Intelligence

Can LLMs Self‑Improve After Deployment? Inside Microsoft’s Online Experiential Learning

Microsoft’s Online Experiential Learning framework lets large language models continuously self‑evolve after deployment by extracting experience from user interactions and consolidating it into model parameters, eliminating the need for human labels, reward models, or server‑side environment access, and demonstrating scalable gains across tasks and model sizes.

AI researchLLMOnline Learning
0 likes · 9 min read
Can LLMs Self‑Improve After Deployment? Inside Microsoft’s Online Experiential Learning
AIWalker
AIWalker
Mar 22, 2026 · Artificial Intelligence

Can a Single Vision Model Replace Multiple Specialized Networks? Nvidia’s New Aggregated Foundation Model

Nvidia’s latest aggregated vision foundation model consolidates detection, segmentation, and other visual tasks into one network, eliminating the complexity and resource waste of multi‑model stacks; the article explains the challenges of resolution balance and teacher distribution, outlines three model generations (RADIOv2.5, C‑RADIOv3, C‑RADIOv4), and details the novel multi‑teacher distillation techniques that boost performance across benchmarks.

Model AggregationNvidiaknowledge distillation
0 likes · 6 min read
Can a Single Vision Model Replace Multiple Specialized Networks? Nvidia’s New Aggregated Foundation Model
AI Agent Research Hub
AI Agent Research Hub
Mar 10, 2026 · Artificial Intelligence

How Knowledge Distillation Lets Neural Networks Grow Physical Symmetry Without Hard PINN Constraints

The paper introduces Ψ‑NN, a knowledge‑distillation framework that automatically discovers physics‑consistent network structures for PINNs, eliminating the need for manually imposed loss‑function constraints and achieving faster convergence, higher accuracy, and transferable architectures across PDE problems.

Hierarchical ClusteringNetwork Structure Discoveryknowledge distillation
0 likes · 26 min read
How Knowledge Distillation Lets Neural Networks Grow Physical Symmetry Without Hard PINN Constraints
AIWalker
AIWalker
Mar 3, 2026 · Artificial Intelligence

How NanoSD Cuts 90% Parameters to Enable Real‑Time Photo Editing on Mobile

NanoSD distills Stable Diffusion 1.5 into a 130 M‑parameter model that runs inference in 20 ms on a Qualcomm SM8750 NPU, using hardware‑aware module pruning, module‑level knowledge distillation, and Bayesian optimization to achieve Pareto‑optimal quality‑efficiency trade‑offs for on‑device image restoration.

Bayesian OptimizationStable Diffusionknowledge distillation
0 likes · 14 min read
How NanoSD Cuts 90% Parameters to Enable Real‑Time Photo Editing on Mobile
PaperAgent
PaperAgent
Mar 1, 2026 · Artificial Intelligence

How On-Policy Context Distillation Enables LLMs to Retain Experience Forever

On-Policy Context Distillation (OPCD) compresses transient in‑context knowledge into LLM parameters, allowing models to permanently retain problem‑solving experience without ground‑truth labels; the article details the OPCD framework, training steps, teacher‑student configurations, and experimental results on math, games, and system‑prompt tasks, highlighting its advantages over traditional context distillation.

LLMOPCDartificial intelligence
0 likes · 8 min read
How On-Policy Context Distillation Enables LLMs to Retain Experience Forever
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Feb 20, 2026 · Artificial Intelligence

How Time Distillation Empowers Large Language Models for Time‑Series Forecasting (T‑LLM)

The paper introduces T‑LLM, a time‑distillation framework that transfers predictive behavior from a lightweight teacher model to a general‑purpose LLM, enabling accurate multivariate time‑series forecasting across full‑sample, few‑shot, and zero‑shot settings while eliminating the need for large‑scale pre‑training.

Few‑Shot LearningT-LLMknowledge distillation
0 likes · 18 min read
How Time Distillation Empowers Large Language Models for Time‑Series Forecasting (T‑LLM)
Ximalaya Technology Team
Ximalaya Technology Team
Feb 11, 2026 · Artificial Intelligence

How Ximalaya Used Generative AI to Revolutionize Audio Recommendations

This article details Ximalaya's journey from traditional multi‑stage recommendation pipelines to generative AI‑driven models, covering business challenges, architectural and model differences, phased deployments, knowledge distillation, semantic ID encoding, decoder‑only strategies, extensive offline and online evaluations, and future research directions.

Encoder-DecoderRecommendation Systemsaudio recommendation
0 likes · 24 min read
How Ximalaya Used Generative AI to Revolutionize Audio Recommendations
PaperAgent
PaperAgent
Jan 17, 2026 · Artificial Intelligence

How Qwen3‑VL Embedding and Reranker Set New SOTA in Multimodal Retrieval

The article analyzes the Qwen3‑VL‑Embedding and Qwen3‑VL‑Reranker models, detailing their unified vector space, multi‑stage training pipeline, Matryoshka representation learning, quantization techniques, massive synthetic data generation, and benchmark results that push multimodal retrieval performance to a new state‑of‑the‑art.

EmbeddingMultimodal AIknowledge distillation
0 likes · 7 min read
How Qwen3‑VL Embedding and Reranker Set New SOTA in Multimodal Retrieval
HyperAI Super Neural
HyperAI Super Neural
Jan 3, 2026 · Artificial Intelligence

Clone a Voice in 5 seconds with One‑Step Generation: Inside Chatterbox‑Turbo’s High‑Fidelity TTS

Resemble AI’s open‑source Chatterbox‑Turbo reduces TTS generation from ten steps to one, enabling high‑sample‑rate, lossless voice cloning from a 5‑10 second reference while supporting emotional control, side‑language tags, and embedded watermarking for real‑time applications across chatbots, games, podcasts, and education.

Chatterbox‑TurboReal-time inferenceknowledge distillation
0 likes · 7 min read
Clone a Voice in 5 seconds with One‑Step Generation: Inside Chatterbox‑Turbo’s High‑Fidelity TTS
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Dec 30, 2025 · Artificial Intelligence

Bridging Tokenizer Gaps: Cross-Tokenizer Knowledge Distillation at AAAI 2026

This paper introduces SeDi, a semantics‑ and distribution‑aware cross‑tokenizer knowledge distillation framework that aligns teacher and student token spaces via bipartite graph components and top‑K re‑encoding, achieving state‑of‑the‑art performance and lower exposure bias on multiple LLM benchmarks.

AI researchcross-tokenizer distillationentropy alignment
0 likes · 10 min read
Bridging Tokenizer Gaps: Cross-Tokenizer Knowledge Distillation at AAAI 2026
360 Smart Cloud
360 Smart Cloud
Dec 3, 2025 · Artificial Intelligence

How Model Distillation Enhances LLM Performance on the TLM Platform

This article explains the TLM large‑model development platform and details how knowledge distillation—using soft labels, temperature scaling, and combined loss functions—compresses teacher models into efficient student models, with practical steps and evaluation on the platform.

AILLMTLM platform
0 likes · 5 min read
How Model Distillation Enhances LLM Performance on the TLM Platform
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Nov 28, 2025 · Artificial Intelligence

Boosting 5G Complaint Intent Detection with Large-Model-Enhanced Few-Shot Learning

This paper presents a collaborative framework where a large language model generates high‑quality synthetic samples to augment a lightweight model, dramatically improving few‑shot user‑complaint intent recognition in 5G networks, achieving a 21% boost for rare categories and a 9% overall accuracy gain.

Few‑Shot Learningcomplaint intent detectiondata augmentation
0 likes · 27 min read
Boosting 5G Complaint Intent Detection with Large-Model-Enhanced Few-Shot Learning
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 4, 2025 · Artificial Intelligence

How Alibaba Cloud’s PAI Powers Cutting‑Edge LLM Research at EMNLP 2025

EMNLP 2025 in Suzhou will feature Alibaba Cloud’s AI platform PAI presenting four accepted papers on knowledge distillation, small‑model reasoning, distilled reasoning models, and an automated RAG benchmark framework, alongside exhibition demos, networking events, and recruitment opportunities for AI talent.

AI PlatformEMNLP 2025Retrieval Augmented Generation
0 likes · 10 min read
How Alibaba Cloud’s PAI Powers Cutting‑Edge LLM Research at EMNLP 2025
DataFunTalk
DataFunTalk
Oct 30, 2025 · Artificial Intelligence

How On-Policy Distillation Cuts LLM Training Cost by 90%

Thinking Machines Lab introduces On-Policy Distillation, a post‑training technique that matches reinforcement‑learning performance while reducing compute cost by up to tenfold, and demonstrates its effectiveness through extensive experiments on reasoning, personalization, and catastrophic‑forgetting mitigation.

On-Policy Distillationknowledge distillationmodel efficiency
0 likes · 15 min read
How On-Policy Distillation Cuts LLM Training Cost by 90%
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Oct 25, 2025 · Artificial Intelligence

Time Series Paper Digest: Extreme Event Prediction, Multimodal Fusion & Anomaly Detection

This article summarizes four recent arXiv papers on time‑series forecasting, covering a hierarchical knowledge‑distillation framework for extreme events, a graph‑enhanced multimodal fusion network, an interpretable unsupervised anomaly detector, and an adaptive masking loss that improves prediction accuracy.

Time Seriesadaptive maskinganomaly detection
0 likes · 10 min read
Time Series Paper Digest: Extreme Event Prediction, Multimodal Fusion & Anomaly Detection
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Oct 13, 2025 · Artificial Intelligence

How Large‑and‑Small Language Model Collaboration Is Shaping the Future

The article argues that combining large, high‑capacity models with lightweight, fine‑tuned small models can cut costs, lower latency, enable specialized vertical tasks, and shift development from chasing ever‑bigger models toward optimal system architectures, outlining key techniques such as state‑space models, knowledge distillation, and staged fine‑tuning.

AI ArchitectureFine-tuningefficiency
0 likes · 3 min read
How Large‑and‑Small Language Model Collaboration Is Shaping the Future
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Oct 13, 2025 · Artificial Intelligence

How Quantization, Pruning, and Distillation Shrink AI Models for Edge Devices

This article explains the principles, key methods, and practical effects of model quantization, pruning, and knowledge distillation, comparing their advantages and disadvantages, and showing how combining these techniques enables compact, high‑performance AI models on resource‑constrained devices.

Model PruningModel Quantizationedge AI
0 likes · 7 min read
How Quantization, Pruning, and Distillation Shrink AI Models for Edge Devices
Amap Tech
Amap Tech
Sep 2, 2025 · Artificial Intelligence

How Pos2Distill Eliminates Positional Bias in Large Language Models

This article introduces Pos2Distill, a novel knowledge‑distillation framework that transfers capabilities from advantageous to disadvantaged positions in large language models, effectively mitigating positional bias and improving performance on long‑text retrieval and in‑context reasoning tasks.

in-context reasoningknowledge distillationlarge language models
0 likes · 10 min read
How Pos2Distill Eliminates Positional Bias in Large Language Models
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 23, 2025 · Artificial Intelligence

Unlock Efficient LLMs: How Alibaba’s PAI EasyDistill Powers Model Post‑Training

This article explains how Alibaba Cloud's AI platform PAI leverages the EasyDistill framework for post‑training model optimization, covering knowledge distillation concepts, data synthesis techniques, basic and advanced distillation training, the DistilQwen model family, real‑world customer cases, and step‑by‑step practical demos.

AI PlatformEasyDistillLLM optimization
0 likes · 12 min read
Unlock Efficient LLMs: How Alibaba’s PAI EasyDistill Powers Model Post‑Training
Kuaishou Tech
Kuaishou Tech
Jul 21, 2025 · Artificial Intelligence

Can AI Models Think on Demand? Inside KAT‑V1 AutoThink’s Dynamic Reasoning

The article introduces KAT‑V1 AutoThink, a dual‑mode large language model that automatically switches between thinking and non‑thinking modes based on problem difficulty, details its novel training paradigm, reinforcement‑learning enhancements, performance benchmarks against leading open‑source models, and provides open‑source resources for further research.

auto-thinkknowledge distillationlarge language model
0 likes · 14 min read
Can AI Models Think on Demand? Inside KAT‑V1 AutoThink’s Dynamic Reasoning
AI Frontier Lectures
AI Frontier Lectures
Jul 17, 2025 · Artificial Intelligence

Top 8 Tencent Youtu Papers Accepted at ICCV 2025: Innovations in AI and Vision

The 20th ICCV conference announced 8 papers from Tencent Youtu Lab covering stylized face recognition, AI‑generated image detection, heterogeneous knowledge distillation, multi‑conditional diffusion, multimodal LLM distillation, palmprint recognition, low‑light vision, and oracle bone script decipherment, each pushing the frontier of computer vision and AI research.

Computer VisionDatasetICCV 2025
0 likes · 17 min read
Top 8 Tencent Youtu Papers Accepted at ICCV 2025: Innovations in AI and Vision
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 13, 2025 · Artificial Intelligence

How EasyDistill Cuts LLM Costs: Mastering DistilQwen-ThoughtX on Alibaba Cloud

EasyDistill, an open-source framework from Alibaba Cloud PAI, streamlines knowledge distillation for large language models, introducing the DistilQwen-ThoughtX series with variable-length chain-of-thought reasoning, and provides comprehensive best-practice guidance for training, fine-tuning, evaluation, compression, and deployment via the PAI-ModelGallery.

AI inferenceLLMknowledge distillation
0 likes · 12 min read
How EasyDistill Cuts LLM Costs: Mastering DistilQwen-ThoughtX on Alibaba Cloud
AI Frontier Lectures
AI Frontier Lectures
Jun 9, 2025 · Artificial Intelligence

AI Research Highlights: Robo-DM, DeepKD, LLM Security, and Reasoning Innovations

This roundup presents recent AI breakthroughs, including Robo‑DM’s efficient robot dataset management, DeepKD’s decoupled knowledge‑distillation trainer, a novel informed white‑box attack exposing weaknesses in LLM alignment defenses, the RePPL hallucination detector, Self‑GIVE’s associative reasoning framework, and LLM‑driven RL ensemble methods.

AIknowledge distillationreasoning
0 likes · 15 min read
AI Research Highlights: Robo-DM, DeepKD, LLM Security, and Reasoning Innovations
AIWalker
AIWalker
Jun 3, 2025 · Artificial Intelligence

DeepKD: Double‑Layer Decoupling and Adaptive Denoising Set New ImageNet SOTA

DeepKD introduces a double‑layer decoupling framework and a dynamic top‑K mask that adaptively denoises low‑confidence logits, addressing conflicts between target and non‑target knowledge flows; extensive experiments on CIFAR‑100, ImageNet‑1K, and MS‑COCO demonstrate consistent accuracy gains and state‑of‑the‑art performance.

Deep LearningGSNRSOTA
0 likes · 23 min read
DeepKD: Double‑Layer Decoupling and Adaptive Denoising Set New ImageNet SOTA
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
May 28, 2025 · Artificial Intelligence

How EasyDistill Simplifies LLM Knowledge Distillation for Faster, Smaller Models

EasyDistill, an open‑source toolkit from Alibaba Cloud AI Platform, streamlines knowledge distillation of large language models by offering modular data synthesis, black‑box and white‑box training, reinforcement‑learning and preference‑optimization techniques, enabling the creation of compact, high‑performance DistilQwen models and accompanying datasets.

DistilQwenEasyDistillknowledge distillation
0 likes · 17 min read
How EasyDistill Simplifies LLM Knowledge Distillation for Faster, Smaller Models
Alimama Tech
Alimama Tech
Apr 23, 2025 · Artificial Intelligence

Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning

The paper introduces an explainable LLM framework (ELLM‑rele) that uses chain‑of‑thought reasoning and a multi‑dimensional knowledge distillation pipeline to compress large‑model relevance judgments into lightweight student models, achieving superior offline relevance scores and online click‑through and conversion improvements in Taobao’s search advertising.

LLMchain-of-thoughtexplainability
0 likes · 17 min read
Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 29, 2025 · Artificial Intelligence

How DistilQwen2.5‑R1 Boosts Small‑Model Reasoning with Innovative Knowledge Distillation

The article introduces the DistilQwen2.5‑R1 series, which leverages a novel knowledge‑distillation pipeline—including CoT data evaluation, improvement, and validation—to transfer deep reasoning abilities from large models like DeepSeek‑R1 to compact models, achieving superior performance across math, code, and scientific benchmarks and providing open‑source checkpoints and deployment guides for practical use.

AI inferencebenchmark evaluationknowledge distillation
0 likes · 17 min read
How DistilQwen2.5‑R1 Boosts Small‑Model Reasoning with Innovative Knowledge Distillation
Tencent Cloud Developer
Tencent Cloud Developer
Mar 25, 2025 · Artificial Intelligence

Knowledge Distillation in Diffusion Models: Techniques and Applications

The article explains how knowledge distillation transfers capabilities from large to smaller diffusion models, covering hard and soft labels, temperature scaling, and contrasting it with data distillation, while detailing techniques such as consistency models, progressive distillation, adversarial distillation, and adversarial post‑training for model compression and step reduction.

adversarial post-trainingadversarial trainingconsistency models
0 likes · 19 min read
Knowledge Distillation in Diffusion Models: Techniques and Applications
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Mar 10, 2025 · Artificial Intelligence

Revisiting Knowledge Distillation for Autoregressive Language Models

The article analyzes why larger teacher models can hurt student performance in autoregressive language model distillation, reveals that different tokens require distinct teaching modes, proposes an Adaptive Token‑wise Knowledge Distillation (ATKD) method, and shows through extensive experiments that ATKD consistently improves accuracy by about 3 % and enhances generalization across model sizes.

adaptive teachingautoregressive language modelsknowledge distillation
0 likes · 9 min read
Revisiting Knowledge Distillation for Autoregressive Language Models
IT Architects Alliance
IT Architects Alliance
Feb 26, 2025 · Artificial Intelligence

DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies

The article provides an in‑depth overview of DeepSeek’s large language model, detailing its mixture‑of‑experts and Transformer foundations, novel attention mechanisms, load‑balancing, multi‑token prediction, FP8 mixed‑precision training, and various training regimes such as knowledge distillation and reinforcement learning.

DeepSeekFP8MLA
0 likes · 18 min read
DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 25, 2025 · Artificial Intelligence

How DistilQwen2.5 Boosts LLM Efficiency with Dual‑Stage Knowledge Distillation

This article introduces DistilQwen2.5, a lightweight LLM series built on Qwen2.5 that uses a novel two‑layer distillation framework, instruction‑data optimization, and parameter‑fusion techniques to achieve higher performance while drastically reducing computational cost and deployment overhead.

LLMefficient inferenceknowledge distillation
0 likes · 26 min read
How DistilQwen2.5 Boosts LLM Efficiency with Dual‑Stage Knowledge Distillation
Architecture Digest
Architecture Digest
Feb 25, 2025 · Artificial Intelligence

DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

DeepSeek’s distillation technology combines data and model distillation to transfer knowledge from large teacher models to compact student models, detailing its definitions, principles, key innovations, architecture, training methods, performance gains, and challenges, especially in multimodal contexts.

AI researchDeepSeekknowledge distillation
0 likes · 16 min read
DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Feb 24, 2025 · Artificial Intelligence

Can Multi‑Teacher Distillation Overcome Catastrophic Forgetting in Continual Learning?

This paper proposes a multi‑teacher distillation framework for continual learning that combines active data rehearsal with feature‑decoupled distillation, demonstrating superior performance on PASCAL VOC and COCO benchmarks while mitigating catastrophic forgetting and balancing stability‑plasticity trade‑offs.

AICatastrophic Forgettingactive rehearsal
0 likes · 12 min read
Can Multi‑Teacher Distillation Overcome Catastrophic Forgetting in Continual Learning?
Su San Talks Tech
Su San Talks Tech
Feb 23, 2025 · Artificial Intelligence

How DeepSeek’s Distillation Breaks AI Model Limits: Core Principles & Performance

This article explores DeepSeek’s cutting‑edge distillation technology, detailing its definition, underlying principles, innovative data‑model fusion, architecture choices, training strategies, performance gains over large language models, and the remaining challenges in knowledge transfer and multimodal data processing.

AI OptimizationDeepSeekMultimodal Learning
0 likes · 16 min read
How DeepSeek’s Distillation Breaks AI Model Limits: Core Principles & Performance
Architects' Tech Alliance
Architects' Tech Alliance
Feb 18, 2025 · Artificial Intelligence

How to Distill DeepSeek LLMs into Lightweight Models for Local Deployment

This article explains DeepSeek's knowledge‑distillation approach for compressing large language models into small, efficient student models, details step‑by‑step local deployment requirements, performance optimizations, and highlights the cost, privacy, and application benefits of running the distilled model on‑premise.

AI inferenceDeepSeekLLM
0 likes · 10 min read
How to Distill DeepSeek LLMs into Lightweight Models for Local Deployment
JD Tech Talk
JD Tech Talk
Feb 13, 2025 · Artificial Intelligence

DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations

This article introduces the DeepSeek family of models, explains the concepts of online search and deep reasoning, details the two‑phase training pipeline with data augmentation and reinforcement learning, and showcases practical experiments and deployment examples for the R1 and distilled variants.

DeepSeekLLMModel Training
0 likes · 10 min read
DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations
JD Cloud Developers
JD Cloud Developers
Feb 13, 2025 · Artificial Intelligence

Unlocking DeepSeek R1: Concepts, Training Secrets, and Real-World Experiments

This article demystifies DeepSeek R1 by explaining key concepts such as online search integration and the R1 model, detailing its two‑phase training pipeline, core techniques like iterative data enhancement, and showcases practical reproductions, benchmark tests, and deployment examples for AI developers.

DeepSeekModel Trainingknowledge distillation
0 likes · 12 min read
Unlocking DeepSeek R1: Concepts, Training Secrets, and Real-World Experiments
Architects' Tech Alliance
Architects' Tech Alliance
Feb 12, 2025 · Artificial Intelligence

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

The article examines DeepSeek‑V3’s low‑cost training using 2048 H800 GPUs, explains how knowledge distillation and high‑quality data improve efficiency, discusses expert concerns about training on AI‑generated content, and outlines the limitations and ceiling effect of distillation techniques.

AI SafetyAI Training EfficiencyDeepSeek-V3
0 likes · 7 min read
DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data
Cognitive Technology Team
Cognitive Technology Team
Feb 7, 2025 · Artificial Intelligence

Knowledge Distillation: Concepts, Techniques, Applications, and Future Directions

This article explains knowledge distillation—a technique introduced by Geoffrey Hinton that transfers knowledge from large teacher models to compact student models—covering its core concepts, loss functions, various distillation strategies, notable applications in edge computing, federated learning, continual learning, and emerging research directions.

Deep LearningEdge ComputingFederated Learning
0 likes · 7 min read
Knowledge Distillation: Concepts, Techniques, Applications, and Future Directions
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Feb 6, 2025 · Artificial Intelligence

How Knowledge Distillation Powers Efficient Large‑Model Deployment

This article explains how knowledge distillation enables massive AI models to be compressed and deployed efficiently, covering its principles, classification dimensions, implementation steps, innovative practices at DeepSeek, real‑world applications, and future research directions.

DeepSeekartificial intelligenceknowledge distillation
0 likes · 11 min read
How Knowledge Distillation Powers Efficient Large‑Model Deployment
DataFunTalk
DataFunTalk
Jan 26, 2025 · Artificial Intelligence

58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations

Since the launch of ChatGPT, 58.com has built a Model‑as‑a‑Service platform called LingXi that trains and serves domain‑specific large language models, supports over a hundred internal scenarios with daily inference exceeding ten million calls, and continuously improves performance through quantization, GPU optimization, model miniaturization, and advanced AI applications such as interview assistants, voice agents, and RAG‑enabled agents.

AI PlatformAI applicationsInference Optimization
0 likes · 9 min read
58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations
Kuaishou Tech
Kuaishou Tech
Jan 24, 2025 · Artificial Intelligence

KwaiCoder-23BA4-v1: An Efficient Large Code Generation Model via Pruning, Knowledge Distillation, and Granular Upcycling

KwaiCoder-23BA4-v1 is a 23B wide MoE code‑completion model that achieves state‑of‑the‑art performance on HumanEval, BigCodeBench and Fill‑in‑Middle benchmarks by using high‑quality data, a cost‑effective training pipeline that combines model pruning, knowledge distillation and fine‑grained merging, and extensive ablation studies.

AIBenchmarkCode Generation
0 likes · 10 min read
KwaiCoder-23BA4-v1: An Efficient Large Code Generation Model via Pruning, Knowledge Distillation, and Granular Upcycling
AIWalker
AIWalker
Jan 18, 2025 · Artificial Intelligence

SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture

SnapGen is a 379 M‑parameter text‑to‑image diffusion model that produces 1024 px images on mobile devices in about 1.4 seconds, using a compact U‑Net design, multi‑stage knowledge distillation, step distillation, and optimized training tricks to outperform much larger models on standard benchmarks.

Mobile AISnapGendiffusion models
0 likes · 22 min read
SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture
AIWalker
AIWalker
Jan 12, 2025 · Artificial Intelligence

SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture

SnapGen introduces a compact 379M‑parameter diffusion model that produces 1024‑pixel text‑to‑image results in about 1.4 seconds on a mobile device, achieving competitive FID scores and outperforming much larger models through a series of architecture refinements, advanced training tricks, and multi‑level knowledge distillation.

Mobile AISnapGendiffusion models
0 likes · 23 min read
SnapGen Generates 1024px Images in 1.4 s with Lightweight On‑Device Architecture
AIWalker
AIWalker
Jan 10, 2025 · Artificial Intelligence

How a Simplified Transformer Enables Lightweight CLIP Training on a Single RTX3090

This paper presents SiCLIP, a framework that simplifies the Transformer architecture, combines weight‑sharing, multi‑stage knowledge distillation, and a novel pair‑matching loss with synthetic captions to train a competitive CLIP model using only one RTX3090 GPU and 1 TB of storage, achieving state‑of‑the‑art data‑size‑parameter‑accuracy trade‑offs.

CLIPLightweight TrainingSynthetic Captions
0 likes · 19 min read
How a Simplified Transformer Enables Lightweight CLIP Training on a Single RTX3090
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 8, 2024 · Artificial Intelligence

How TAPIR Boosts Small LLMs with Task‑Aware Curriculum Planning

The paper introduces TAPIR, a task‑aware curriculum planning framework that distills instruction‑following abilities from black‑box LLM teachers into smaller student models by filtering difficult prompts, resampling tasks, enhancing response styles, and iteratively optimizing across multiple training rounds, achieving superior performance on benchmark evaluations.

Instruction TuningLLM distillationTAPIR
0 likes · 10 min read
How TAPIR Boosts Small LLMs with Task‑Aware Curriculum Planning
DataFunSummit
DataFunSummit
Sep 23, 2024 · Artificial Intelligence

TransLLM: A Framework for Cross‑Language Transfer of Conversational Large Language Models

This article presents TransLLM, a cross‑language migration framework that enables high‑quality conversational LLMs to be transferred to low‑resource languages by preserving advanced capabilities through Recovery KD, LoRA‑based continual pre‑training, and a translation‑thinking‑chain, with extensive experiments showing superior performance and safety over ChatGPT and GPT‑4.

LoRASafetyconversation LLM
0 likes · 22 min read
TransLLM: A Framework for Cross‑Language Transfer of Conversational Large Language Models
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2024 · Artificial Intelligence

Xiaohongshu 2024 Large Model Frontier Paper Sharing Live Event

On June 27, 2024, Xiaohongshu’s technical team will livestream a two‑hour session across WeChat Channels, Bilibili, Douyin and Xiaohongshu, showcasing six top‑conference papers on large‑model advances—including early‑stopping and fine‑grained self‑consistency, novel evaluation methods, negative‑sample‑assisted distillation, and LLM‑based note recommendation—followed by a Q&A and recruitment briefing.

AI researchModel EvaluationRecommendation Systems
0 likes · 12 min read
Xiaohongshu 2024 Large Model Frontier Paper Sharing Live Event
AntTech
AntTech
Mar 11, 2024 · Artificial Intelligence

Can Small Language Models be Good Reasoners in Recommender Systems?

This article presents SLIM, a knowledge‑distillation framework that transfers the reasoning abilities of large language models to compact models for sequential recommendation, enhancing item representation, user profiling, and bias mitigation while achieving comparable performance with far lower computational resources.

AILLMefficiency
0 likes · 12 min read
Can Small Language Models be Good Reasoners in Recommender Systems?
NewBeeNLP
NewBeeNLP
Feb 12, 2024 · Artificial Intelligence

Beyond Dual‑Tower: Advanced Distillation and Interaction Techniques for Recommendation Systems

This article reviews recent advances that enhance dual‑tower recommendation models by injecting interaction information through various knowledge‑distillation strategies and interaction‑enhanced architectures, summarizing methods such as PFD, ENDX, TRMD, VIRT, Distilled‑DualEncoder, ERNIE‑Search, ColBert, IntTower and MVKE.

AI researchdual-towerinteraction modeling
0 likes · 13 min read
Beyond Dual‑Tower: Advanced Distillation and Interaction Techniques for Recommendation Systems
Tencent Cloud Developer
Tencent Cloud Developer
Jan 23, 2024 · Information Security

Metis: Understanding and Enhancing In-Network Regular Expressions

Metis combines deterministic finite automata conversion, byte‑level RNN training, and knowledge‑distilled random‑forest models to replace traditional regex matching on resource‑constrained network devices, delivering comparable accuracy while achieving up to 74× higher throughput and significant resource savings in DDoS protection and P4 forwarding.

In‑network computingNeurIPS 2023P4 Programmable Switches
0 likes · 9 min read
Metis: Understanding and Enhancing In-Network Regular Expressions
Tencent Architect
Tencent Architect
Jan 16, 2024 · Artificial Intelligence

Metis: AI‑Driven In‑Network Regular Expression Enhancement for High‑Performance Traffic Inspection

The article introduces Metis, an AI‑based solution that replaces traditional regular‑expression matching for network traffic inspection, offering faster, more accurate detection, a compact model deployable on resource‑constrained P4 switches, and significant performance and cost benefits for cloud gateway security.

AIP4knowledge distillation
0 likes · 9 min read
Metis: AI‑Driven In‑Network Regular Expression Enhancement for High‑Performance Traffic Inspection
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jan 12, 2024 · Artificial Intelligence

Negative Sample Assisted Distillation for Large Language Models

The AAAI‑2024 paper introduces a Negative Sample Assisted Distillation framework—comprising Negative Assistance Training, Negative Calibration Enhancement, and Adaptive Self‑Consistency—that leverages both correct and incorrect reasoning examples to train a compact LLaMA‑7B student, achieving up to 75.75 % accuracy gains over fine‑tuning on MATH and improving out‑of‑domain benchmarks.

LLMchain-of-thoughtknowledge distillation
0 likes · 13 min read
Negative Sample Assisted Distillation for Large Language Models
DeWu Technology
DeWu Technology
Dec 20, 2023 · Artificial Intelligence

Coarse Ranking in Recommenders: Key Strategies, Metrics & Optimizations

This article systematically reviews the coarse‑ranking stage of recommendation systems, comparing it with recall and fine‑ranking, defining evaluation metrics, detailing sample design, presenting two technical routes, and exploring optimization directions such as dual‑tower models, knowledge distillation, lightweight fully‑connected layers, multi‑objective and multi‑scenario modeling, followed by practical case studies and results.

Evaluation Metricscoarse rankingdual-tower
0 likes · 22 min read
Coarse Ranking in Recommenders: Key Strategies, Metrics & Optimizations
Baidu Tech Salon
Baidu Tech Salon
Oct 25, 2023 · Artificial Intelligence

Intelligent Question Answering Technology in Baidu Search: Development, Modeling, and Retrieval‑Enhanced Generation

The article surveys Baidu Search’s intelligent question‑answering system, tracing its evolution from feature‑engineered retrieval to large pre‑trained and generative models, and detailing hierarchical readers, multi‑teacher distillation, retrieval‑enhanced generation, and instruction decomposition as key techniques for delivering fast, accurate, citation‑rich answers.

Baidu SearchRetrieval Augmented Generationknowledge distillation
0 likes · 18 min read
Intelligent Question Answering Technology in Baidu Search: Development, Modeling, and Retrieval‑Enhanced Generation
Baidu Geek Talk
Baidu Geek Talk
Oct 25, 2023 · Artificial Intelligence

How Baidu Search Is Transforming Machine Question Answering with Large‑Scale AI Models

This article reviews the evolution of machine question answering, from early feature‑engineered systems to modern large‑language‑model‑driven retrieval‑augmented generation, outlines Baidu Search’s current Retriever‑Reader architecture, discusses challenges such as semantic complexity, latency and answer quality, and presents solutions including hierarchical DocMRC modeling, multi‑teacher knowledge distillation, and instruction decomposition for efficient, high‑quality answers.

BaiduRetrieval Augmented Generationknowledge distillation
0 likes · 18 min read
How Baidu Search Is Transforming Machine Question Answering with Large‑Scale AI Models
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Sep 22, 2023 · Artificial Intelligence

An Introduction to Knowledge Distillation for Model Compression

This article explains the AI model‑compression technique of knowledge distillation, describing how a large teacher network transfers its soft predictions to a lightweight student network using temperature‑scaled softmax, enabling deployment on resource‑constrained devices.

artificial intelligenceknowledge distillationmodel compression
0 likes · 13 min read
An Introduction to Knowledge Distillation for Model Compression
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 12, 2023 · Artificial Intelligence

How ConaCLIP Boosts Lightweight Text-Image Retrieval with Dual‑Encoder Distillation

ConaCLIP introduces a fully‑connected knowledge interaction graph to distill large dual‑encoder models into compact ones, enhancing text‑image retrieval accuracy and efficiency on edge devices, with extensive experiments and supervision strategies demonstrating significant gains over existing baselines.

AIConaCLIPDual Encoder
0 likes · 9 min read
How ConaCLIP Boosts Lightweight Text-Image Retrieval with Dual‑Encoder Distillation
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2023 · Artificial Intelligence

Open-Vocabulary Object Attribute Recognition with OvarNet: A Unified Framework for Detection and Attribute Classification

At CVPR 2023 the Xiaohongshu team presented OvarNet, a unified one‑stage Faster‑RCNN model built on CLIP that uses prompt learning and knowledge distillation to jointly detect objects and recognize open‑vocabulary attributes, achieving state‑of‑the‑art results on VAW, MS‑COCO, LSA and OVAD datasets.

Computer VisionMultimodal Learningattribute recognition
0 likes · 12 min read
Open-Vocabulary Object Attribute Recognition with OvarNet: A Unified Framework for Detection and Attribute Classification
DataFunTalk
DataFunTalk
Apr 26, 2023 · Artificial Intelligence

Serializing Advertising Placement with User Algorithms at Alibaba Health

Alibaba Health’s user algorithm leverages multi‑channel serialized ad placement, using vector‑based three‑tower models, knowledge distillation, and ROI‑oriented optimizations to sequence user touchpoints, improve conversion rates, and enhance model accuracy across diverse marketing channels.

AdvertisingROIUser Segmentation
0 likes · 15 min read
Serializing Advertising Placement with User Algorithms at Alibaba Health
DataFunSummit
DataFunSummit
Apr 20, 2023 · Artificial Intelligence

Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining

This presentation introduces the Mengzi lightweight model technology stack, covering large‑scale pre‑training, motivations for lightweight models, detailed techniques such as knowledge and sequence‑relation enhancement, training optimization, model compression, retrieval‑augmented pre‑training, multimodal extensions, open‑source releases, and real‑world applications.

knowledge distillationlarge language modelsmultimodal
0 likes · 23 min read
Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining
Meituan Technology Team
Meituan Technology Team
Apr 13, 2023 · Artificial Intelligence

Peak-First Regularization for Low-Latency Streaming Speech Recognition

The paper presents a low‑latency streaming speech‑recognition solution that reframes latency reduction as a knowledge‑distillation task, using a simple peak‑first regularization term to shift CTC output probabilities leftward and achieve up to 200 ms average latency reduction without harming word error rate.

CTCLatency ReductionPeak-First Regularization
0 likes · 21 min read
Peak-First Regularization for Low-Latency Streaming Speech Recognition
DataFunSummit
DataFunSummit
Feb 3, 2023 · Artificial Intelligence

Interactive BERT for Relevance in Health E‑commerce Search

This article presents an in‑depth exploration of an interactive BERT‑based relevance model for health e‑commerce search, detailing the business context, query and product feature extraction, domain‑specific sample generation, model architecture enhancements, offline and online performance gains, and practical deployment through knowledge distillation.

AIBERTSemantic Modeling
0 likes · 14 min read
Interactive BERT for Relevance in Health E‑commerce Search
DataFunTalk
DataFunTalk
Jan 11, 2023 · Artificial Intelligence

Exploring Interactive BERT for Relevance in Health E‑commerce Search

This article presents a comprehensive overview of Alibaba Health's interactive BERT approach for improving relevance in health e‑commerce search, covering business background, model design, domain‑specific data construction, knowledge‑distilled twin‑tower deployment, experimental results, and a detailed Q&A session.

AIBERTSemantic Modeling
0 likes · 14 min read
Exploring Interactive BERT for Relevance in Health E‑commerce Search
DataFunTalk
DataFunTalk
Dec 7, 2022 · Artificial Intelligence

Entire Space Delayed Feedback with Cross‑Task Knowledge Distillation (ESDC) for Multi‑Task E‑commerce Recommendation

This article presents Xiaomi’s e‑commerce recommendation research, addressing four key challenges—sample selection bias, data sparsity, delayed feedback, and knowledge inconsistency—by introducing the Entire Space Delayed Feedback with Cross‑Task Knowledge Distillation (ESDC) model, which combines causal inference, cross‑task distillation, twin networks, and uncertainty weighting to improve CVR prediction and achieve a 15% GMV lift over the baseline.

AICVRDelayed Feedback
0 likes · 11 min read
Entire Space Delayed Feedback with Cross‑Task Knowledge Distillation (ESDC) for Multi‑Task E‑commerce Recommendation
Zuoyebang Tech Team
Zuoyebang Tech Team
Sep 15, 2022 · Artificial Intelligence

How We Replaced BERT with a Lightweight TextCNN to Slash GPU Costs

This article describes the production challenges of using BERT for large‑scale text classification at Zuoyebang, explores lightweight alternatives such as knowledge distillation, pruning and quantization, and details a teacher‑student‑active‑learning pipeline that trains a TextCNN model to match BERT performance while dramatically reducing GPU consumption and improving throughput.

BERTModel DeploymentNLP
0 likes · 13 min read
How We Replaced BERT with a Lightweight TextCNN to Slash GPU Costs
DataFunSummit
DataFunSummit
Sep 4, 2022 · Artificial Intelligence

Sparse Features in Machine Learning: Challenges, NVIDIA Ampere Structured Sparsity, Knowledge Distillation, and GAN Model Compression

This talk explores the challenges and opportunities of leveraging sparsity in machine learning models, covering fine‑grained and coarse‑grained sparsity, NVIDIA Ampere’s 2:4 structured sparsity, knowledge‑distillation techniques for converting unstructured to structured sparsity, and model compression strategies for generative adversarial networks.

Deep LearningGANGPU Acceleration
0 likes · 14 min read
Sparse Features in Machine Learning: Challenges, NVIDIA Ampere Structured Sparsity, Knowledge Distillation, and GAN Model Compression
DataFunSummit
DataFunSummit
Aug 14, 2022 · Artificial Intelligence

Optimizing Pre‑Ranking in Meituan Search: Knowledge Distillation and Neural Architecture Search

This article describes Meituan Search's pre‑ranking (coarse‑ranking) system evolution and presents two major optimization strategies—leveraging knowledge distillation to align coarse‑ranking with fine‑ranking and employing neural architecture search to jointly improve effectiveness and latency—demonstrating significant offline and online performance gains.

Neural Architecture Searchknowledge distillationmachine learning
0 likes · 17 min read
Optimizing Pre‑Ranking in Meituan Search: Knowledge Distillation and Neural Architecture Search
Meituan Technology Team
Meituan Technology Team
Aug 11, 2022 · Artificial Intelligence

Optimizing Pre‑Ranking in Meituan Search: Knowledge Distillation and Neural Architecture Search

Meituan’s search team upgraded its pre‑ranking layer from simple linear models to end‑to‑end neural networks, boosting effectiveness by applying three knowledge‑distillation techniques—including result‑list, score, and contrastive representation transfer—and by using latency‑aware neural architecture search to automatically select features and network structures, achieving significant recall and CTR gains without added latency.

Neural Architecture Searchefficiency optimizationknowledge distillation
0 likes · 19 min read
Optimizing Pre‑Ranking in Meituan Search: Knowledge Distillation and Neural Architecture Search
Alimama Tech
Alimama Tech
Jul 27, 2022 · Artificial Intelligence

CACS: Cascade Architecture for Creative Selection in Advertising

The Cascade Architecture for Creative Selection (CACS) reorders the advertising pipeline by placing a dual‑tower creative‑selection module ahead of ranking, using soft‑label list‑wise distillation and adaptive dropout to jointly optimize creatives and ads, yielding 5% latency increase but significant CTR and RPM gains in Taobao’s search ads.

ad rankingadaptive dropoutcascade architecture
0 likes · 17 min read
CACS: Cascade Architecture for Creative Selection in Advertising
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Jun 2, 2022 · Artificial Intelligence

How Knowledge Distillation Shrinks Deep Neural Networks Without Losing Accuracy

Knowledge Distillation, a teacher‑student model compression technique, enables large, high‑performing deep neural networks to transfer their learned representations to smaller models, achieving comparable accuracy with faster inference, reduced resource consumption, and broader applicability in computer‑vision tasks.

AIComputer VisionFitNet
0 likes · 14 min read
How Knowledge Distillation Shrinks Deep Neural Networks Without Losing Accuracy
Alimama Tech
Alimama Tech
May 25, 2022 · Artificial Intelligence

UKD: Debiasing Conversion Rate Estimation via Uncertainty-regularized Knowledge Distillation

The paper introduces UKD, an uncertainty‑regularized knowledge‑distillation framework that uses a click‑adaptive teacher to generate pseudo‑conversion labels for unclicked impressions and trains a student model with uncertainty‑weighted loss, thereby mitigating sample‑selection bias and achieving up to 3.4% CVR improvement and 4.3% CPA reduction on large‑scale advertising datasets.

CVR debiasingadvertising algorithmsconversion rate estimation
0 likes · 20 min read
UKD: Debiasing Conversion Rate Estimation via Uncertainty-regularized Knowledge Distillation
DaTaobao Tech
DaTaobao Tech
May 18, 2022 · Artificial Intelligence

Deep Ranking Optimization for E-commerce Recommendation

The 2021 Taobao New‑Product team boosted e‑commerce recommendation by redesigning the coarse‑ranking stage with a dual‑tower DSSM, low‑cost feature‑crossing, NOVA attention and multi‑task distillation from a fine‑ranking teacher, delivering up to +30‰ GAUC gain and 3‑5 % online CTR and click improvements.

Model Optimizationdeep rankinge‑commerce
0 likes · 17 min read
Deep Ranking Optimization for E-commerce Recommendation
DataFunTalk
DataFunTalk
Apr 22, 2022 · Artificial Intelligence

Inference Optimization Techniques and GPU Parallel Acceleration for Tencent Intelligent Dialogue Models

This article presents a comprehensive overview of inference optimization methods—including model pruning, quantization, knowledge distillation, caching, instruction‑set acceleration, and operator fusion—and details a GPU‑centric parallel acceleration methodology with CUDA basics, performance‑analysis tools, theoretical limits, and practical case studies, all illustrated with real‑world examples from Tencent's intelligent dialogue products.

GPU AccelerationOperator fusioncaching
0 likes · 18 min read
Inference Optimization Techniques and GPU Parallel Acceleration for Tencent Intelligent Dialogue Models
Tencent Cloud Developer
Tencent Cloud Developer
Mar 3, 2022 · Artificial Intelligence

Model Distillation for Query-Document Matching: Techniques and Optimizations

We applied knowledge distillation to a video query‑document BERT matcher, compressing the 12‑layer teacher into production‑ready 1‑layer ALBERT and tiny TextCNN students using combined soft, hard, and relevance losses plus AutoML‑tuned hyper‑parameters, achieving sub‑5 ms latency and up to 2.4% AUC improvement over the original model.

ALBERTAutoMLBERT
0 likes · 12 min read
Model Distillation for Query-Document Matching: Techniques and Optimizations
DataFunTalk
DataFunTalk
Feb 20, 2022 · Artificial Intelligence

Distilled Reinforcement Learning Framework for Recommendation (DRL-Rec): Design, Modules, and Experimental Evaluation

This article presents DRL-Rec, a distilled reinforcement learning framework for recommendation that integrates an exploring‑filtering module and confidence‑guided distillation to compress RL‑based recommenders while improving accuracy, and reports significant offline and online performance gains on a large‑scale system.

knowledge distillationonline experimentsreinforcement learning
0 likes · 16 min read
Distilled Reinforcement Learning Framework for Recommendation (DRL-Rec): Design, Modules, and Experimental Evaluation
DataFunTalk
DataFunTalk
Dec 24, 2021 · Artificial Intelligence

Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

This article reviews three consecutive works from Alibaba DAMO Academy on compressing and distilling large pretrained language models—AdaBERT, L2A, and Meta‑KD—detailing their motivations, neural‑architecture‑search‑based designs, loss formulations, experimental results, and insights from a Q&A session.

AINeural Architecture Searchknowledge distillation
0 likes · 10 min read
Large-Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD
DataFunSummit
DataFunSummit
Dec 21, 2021 · Artificial Intelligence

Large‑Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD

This talk presents Alibaba DAMO Academy’s recent work on compressing large pretrained language models, covering task‑adaptive AdaBERT, data‑augmented L2A, and meta‑knowledge distillation Meta‑KD, describing their motivations, architectures, NAS‑based search, loss designs, and experimental results across multiple NLP tasks.

NLPNeural Architecture Searchknowledge distillation
0 likes · 13 min read
Large‑Scale Pretrained Model Compression and Distillation: AdaBERT, L2A, and Meta‑KD
Meituan Technology Team
Meituan Technology Team
Dec 2, 2021 · Artificial Intelligence

Pretraining Techniques for Search Advertising Relevance at Meituan

Meituan improves search‑ad relevance by applying pre‑trained BERT models enhanced with data‑augmented samples, multi‑task learning, keyword extraction and two‑stage knowledge distillation, producing a lightweight distilled model that, when fused with traditional relevance signals, boosts CTR, lowers Badcase@5 and raises NDCG while preserving revenue.

BERTSearchadvertising relevance
0 likes · 30 min read
Pretraining Techniques for Search Advertising Relevance at Meituan
Alimama Tech
Alimama Tech
Sep 15, 2021 · Artificial Intelligence

Combining Knowledge Distillation, Exposure Forecasting, and Pacing to Guarantee Brand Exposure on Alibaba's Advertising Platform

Alibaba's advertising platform combines knowledge distillation to score traffic, exposure forecasting via GBDT, and PID-based pacing to guarantee contracted impression volumes while improving CTR/CVR, handling delayed exposure and traffic selection, achieving near‑perfect delivery in large promotions.

Alibabaexposure forecastingknowledge distillation
0 likes · 17 min read
Combining Knowledge Distillation, Exposure Forecasting, and Pacing to Guarantee Brand Exposure on Alibaba's Advertising Platform
Baidu Geek Talk
Baidu Geek Talk
Sep 8, 2021 · Artificial Intelligence

How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations

The article provides a comprehensive technical overview of PaddleOCR's PP‑OCRv2, detailing its five major algorithmic enhancements, performance improvements over previous versions, historical milestones, core capabilities, and links to the open‑source repositories for developers interested in state‑of‑the‑art OCR solutions.

Computer VisionModel OptimizationOCR
0 likes · 10 min read
How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations
DataFunSummit
DataFunSummit
Jun 5, 2021 · Artificial Intelligence

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure‑Preserving Methods

This article reviews BERT’s architecture, analyzes the storage and compute costs of each layer, and systematically presents compression methods—including quantization, pruning, knowledge distillation (Distilled BiLSTM and MobileBERT), and structure‑preserving techniques—aimed at enabling efficient deployment on resource‑constrained mobile devices.

BERTMobile Deploymentknowledge distillation
0 likes · 15 min read
Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure‑Preserving Methods
DataFunTalk
DataFunTalk
Jun 3, 2021 · Artificial Intelligence

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure-Preserving Methods

This article examines the internal structure of BERT and systematically presents various model‑compression strategies—including quantization, pruning, knowledge distillation, and structure‑preserving techniques—highlighting their impact on storage, computational cost, and inference speed for deployment on resource‑constrained mobile devices.

BERTMobile AIknowledge distillation
0 likes · 16 min read
Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure-Preserving Methods
AntTech
AntTech
Mar 21, 2021 · Artificial Intelligence

Hubble Intelligent Audience Platform: Three‑Generation Algorithm Evolution for Mobile Marketing

The article describes the Hubble Intelligent Audience Platform’s three‑generation algorithmic evolution—starting from a DSSM‑based model, moving to an asynchronous GNN plus lightweight learning architecture, and finally integrating incremental learning with meta‑weighting—to improve audience expansion for mobile marketing campaigns.

AIGraph Neural NetworkMobile Marketing
0 likes · 14 min read
Hubble Intelligent Audience Platform: Three‑Generation Algorithm Evolution for Mobile Marketing
Amap Tech
Amap Tech
Mar 5, 2021 · Artificial Intelligence

AI Applications in Mobility: Route Planning, ETA Prediction, Dynamic Event Mining, and Global Scheduling

The article surveys Amap’s AI‑driven mobility solutions—from personalized, multi‑objective route planning using Cell‑Based Routing and bias‑aware sorting, through spatio‑temporal ETA prediction and lightweight BERT‑based traffic‑event mining, to rapid POI freshness updates and a future global scheduling system that coordinates vehicles and signals via multi‑agent reinforcement learning.

AIRoute PlanningTraffic Prediction
0 likes · 14 min read
AI Applications in Mobility: Route Planning, ETA Prediction, Dynamic Event Mining, and Global Scheduling
360 Smart Cloud
360 Smart Cloud
Mar 4, 2021 · Artificial Intelligence

Optimizing BERT Online Service Deployment at 360 Search

This article describes the challenges of deploying a large BERT model as an online service for 360 Search and details engineering optimizations—including framework selection, model quantization, knowledge distillation, stream scheduling, caching, and dynamic sequence handling—that dramatically improve latency, throughput, and resource utilization.

BERTFP16 quantizationGPU Optimization
0 likes · 12 min read
Optimizing BERT Online Service Deployment at 360 Search
360 Tech Engineering
360 Tech Engineering
Mar 1, 2021 · Artificial Intelligence

Deploying BERT as an Online Service: Challenges and Optimizations at 360 Search

This article details the engineering challenges of serving a large BERT model in real‑time for 360 Search and describes a series of optimizations—including TensorRT‑based kernel fusion, model quantization, knowledge distillation, multi‑stream execution, caching, and dynamic sequence handling—that together achieve low latency, high throughput, and stable deployment on GPU clusters.

BERTGPUModel Optimization
0 likes · 10 min read
Deploying BERT as an Online Service: Challenges and Optimizations at 360 Search