Tag

knowledge distillation

0 views collected around this technical thread.

Alimama Tech
Alimama Tech
Apr 23, 2025 · Artificial Intelligence

Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning

The paper introduces an explainable LLM framework (ELLM‑rele) that uses chain‑of‑thought reasoning and a multi‑dimensional knowledge distillation pipeline to compress large‑model relevance judgments into lightweight student models, achieving superior offline relevance scores and online click‑through and conversion improvements in Taobao’s search advertising.

Chain-of-ThoughtLLMexplainability
0 likes · 17 min read
Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning
Tencent Cloud Developer
Tencent Cloud Developer
Mar 25, 2025 · Artificial Intelligence

Knowledge Distillation in Diffusion Models: Techniques and Applications

The article explains how knowledge distillation transfers capabilities from large to smaller diffusion models, covering hard and soft labels, temperature scaling, and contrasting it with data distillation, while detailing techniques such as consistency models, progressive distillation, adversarial distillation, and adversarial post‑training for model compression and step reduction.

adversarial post-trainingadversarial trainingconsistency models
0 likes · 19 min read
Knowledge Distillation in Diffusion Models: Techniques and Applications
IT Architects Alliance
IT Architects Alliance
Feb 26, 2025 · Artificial Intelligence

DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies

The article provides an in‑depth overview of DeepSeek’s large language model, detailing its mixture‑of‑experts and Transformer foundations, novel attention mechanisms, load‑balancing, multi‑token prediction, FP8 mixed‑precision training, and various training regimes such as knowledge distillation and reinforcement learning.

DeepSeekFP8MLA
0 likes · 18 min read
DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies
Architecture Digest
Architecture Digest
Feb 25, 2025 · Artificial Intelligence

DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

DeepSeek’s distillation technology combines data and model distillation to transfer knowledge from large teacher models to compact student models, detailing its definitions, principles, key innovations, architecture, training methods, performance gains, and challenges, especially in multimodal contexts.

AI researchDeepSeekknowledge distillation
0 likes · 16 min read
DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges
JD Tech Talk
JD Tech Talk
Feb 13, 2025 · Artificial Intelligence

DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations

This article introduces the DeepSeek family of models, explains the concepts of online search and deep reasoning, details the two‑phase training pipeline with data augmentation and reinforcement learning, and showcases practical experiments and deployment examples for the R1 and distilled variants.

DeepSeekLLMR1
0 likes · 10 min read
DeepSeek R1: Concept Overview, Training Principles, and Practical Implementations
Architects' Tech Alliance
Architects' Tech Alliance
Feb 12, 2025 · Artificial Intelligence

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

The article examines DeepSeek‑V3’s low‑cost training using 2048 H800 GPUs, explains how knowledge distillation and high‑quality data improve efficiency, discusses expert concerns about training on AI‑generated content, and outlines the limitations and ceiling effect of distillation techniques.

AI Training EfficiencyAI safetyDeepSeek-V3
0 likes · 7 min read
DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data
Cognitive Technology Team
Cognitive Technology Team
Feb 7, 2025 · Artificial Intelligence

Knowledge Distillation: Concepts, Techniques, Applications, and Future Directions

This article explains knowledge distillation—a technique introduced by Geoffrey Hinton that transfers knowledge from large teacher models to compact student models—covering its core concepts, loss functions, various distillation strategies, notable applications in edge computing, federated learning, continual learning, and emerging research directions.

Edge ComputingFederated Learningcontinual learning
0 likes · 7 min read
Knowledge Distillation: Concepts, Techniques, Applications, and Future Directions
DataFunTalk
DataFunTalk
Jan 26, 2025 · Artificial Intelligence

58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations

Since the launch of ChatGPT, 58.com has built a Model‑as‑a‑Service platform called LingXi that trains and serves domain‑specific large language models, supports over a hundred internal scenarios with daily inference exceeding ten million calls, and continuously improves performance through quantization, GPU optimization, model miniaturization, and advanced AI applications such as interview assistants, voice agents, and RAG‑enabled agents.

AI PlatformAI applicationsLLM
0 likes · 9 min read
58.com’s LingXi Large Language Model Platform: Development, Deployment, and Performance Optimizations
Kuaishou Tech
Kuaishou Tech
Jan 24, 2025 · Artificial Intelligence

KwaiCoder-23BA4-v1: An Efficient Large Code Generation Model via Pruning, Knowledge Distillation, and Granular Upcycling

KwaiCoder-23BA4-v1 is a 23B wide MoE code‑completion model that achieves state‑of‑the‑art performance on HumanEval, BigCodeBench and Fill‑in‑Middle benchmarks by using high‑quality data, a cost‑effective training pipeline that combines model pruning, knowledge distillation and fine‑grained merging, and extensive ablation studies.

AICode Generationbenchmark
0 likes · 10 min read
KwaiCoder-23BA4-v1: An Efficient Large Code Generation Model via Pruning, Knowledge Distillation, and Granular Upcycling
DataFunSummit
DataFunSummit
Sep 23, 2024 · Artificial Intelligence

TransLLM: A Framework for Cross‑Language Transfer of Conversational Large Language Models

This article presents TransLLM, a cross‑language migration framework that enables high‑quality conversational LLMs to be transferred to low‑resource languages by preserving advanced capabilities through Recovery KD, LoRA‑based continual pre‑training, and a translation‑thinking‑chain, with extensive experiments showing superior performance and safety over ChatGPT and GPT‑4.

LoRASafetyconversation LLM
0 likes · 22 min read
TransLLM: A Framework for Cross‑Language Transfer of Conversational Large Language Models
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2024 · Artificial Intelligence

Xiaohongshu 2024 Large Model Frontier Paper Sharing Live Event

On June 27, 2024, Xiaohongshu’s technical team will livestream a two‑hour session across WeChat Channels, Bilibili, Douyin and Xiaohongshu, showcasing six top‑conference papers on large‑model advances—including early‑stopping and fine‑grained self‑consistency, novel evaluation methods, negative‑sample‑assisted distillation, and LLM‑based note recommendation—followed by a Q&A and recruitment briefing.

AI researchRecommendation systemsSelf-Consistency
0 likes · 12 min read
Xiaohongshu 2024 Large Model Frontier Paper Sharing Live Event
AntTech
AntTech
Mar 11, 2024 · Artificial Intelligence

Can Small Language Models be Good Reasoners in Recommender Systems?

This article presents SLIM, a knowledge‑distillation framework that transfers the reasoning abilities of large language models to compact models for sequential recommendation, enhancing item representation, user profiling, and bias mitigation while achieving comparable performance with far lower computational resources.

AIEfficiencyLLM
0 likes · 12 min read
Can Small Language Models be Good Reasoners in Recommender Systems?
Tencent Cloud Developer
Tencent Cloud Developer
Jan 23, 2024 · Information Security

Metis: Understanding and Enhancing In-Network Regular Expressions

Metis combines deterministic finite automata conversion, byte‑level RNN training, and knowledge‑distilled random‑forest models to replace traditional regex matching on resource‑constrained network devices, delivering comparable accuracy while achieving up to 74× higher throughput and significant resource savings in DDoS protection and P4 forwarding.

Anomaly DetectionIn-Network ComputingNeurIPS 2023
0 likes · 9 min read
Metis: Understanding and Enhancing In-Network Regular Expressions
Tencent Architect
Tencent Architect
Jan 16, 2024 · Artificial Intelligence

Metis: AI‑Driven In‑Network Regular Expression Enhancement for High‑Performance Traffic Inspection

The article introduces Metis, an AI‑based solution that replaces traditional regular‑expression matching for network traffic inspection, offering faster, more accurate detection, a compact model deployable on resource‑constrained P4 switches, and significant performance and cost benefits for cloud gateway security.

AIP4knowledge distillation
0 likes · 9 min read
Metis: AI‑Driven In‑Network Regular Expression Enhancement for High‑Performance Traffic Inspection
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jan 12, 2024 · Artificial Intelligence

Negative Sample Assisted Distillation for Large Language Models

The AAAI‑2024 paper introduces a Negative Sample Assisted Distillation framework—comprising Negative Assistance Training, Negative Calibration Enhancement, and Adaptive Self‑Consistency—that leverages both correct and incorrect reasoning examples to train a compact LLaMA‑7B student, achieving up to 75.75 % accuracy gains over fine‑tuning on MATH and improving out‑of‑domain benchmarks.

Chain-of-ThoughtLLMknowledge distillation
0 likes · 13 min read
Negative Sample Assisted Distillation for Large Language Models
Baidu Tech Salon
Baidu Tech Salon
Oct 25, 2023 · Artificial Intelligence

Intelligent Question Answering Technology in Baidu Search: Development, Modeling, and Retrieval‑Enhanced Generation

The article surveys Baidu Search’s intelligent question‑answering system, tracing its evolution from feature‑engineered retrieval to large pre‑trained and generative models, and detailing hierarchical readers, multi‑teacher distillation, retrieval‑enhanced generation, and instruction decomposition as key techniques for delivering fast, accurate, citation‑rich answers.

Baidu SearchRetrieval-Augmented Generationknowledge distillation
0 likes · 18 min read
Intelligent Question Answering Technology in Baidu Search: Development, Modeling, and Retrieval‑Enhanced Generation
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Sep 22, 2023 · Artificial Intelligence

An Introduction to Knowledge Distillation for Model Compression

This article explains the AI model‑compression technique of knowledge distillation, describing how a large teacher network transfers its soft predictions to a lightweight student network using temperature‑scaled softmax, enabling deployment on resource‑constrained devices.

Artificial Intelligencedeep learningknowledge distillation
0 likes · 13 min read
An Introduction to Knowledge Distillation for Model Compression
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2023 · Artificial Intelligence

Open-Vocabulary Object Attribute Recognition with OvarNet: A Unified Framework for Detection and Attribute Classification

At CVPR 2023 the Xiaohongshu team presented OvarNet, a unified one‑stage Faster‑RCNN model built on CLIP that uses prompt learning and knowledge distillation to jointly detect objects and recognize open‑vocabulary attributes, achieving state‑of‑the‑art results on VAW, MS‑COCO, LSA and OVAD datasets.

Computer Visionattribute recognitionknowledge distillation
0 likes · 12 min read
Open-Vocabulary Object Attribute Recognition with OvarNet: A Unified Framework for Detection and Attribute Classification
DataFunTalk
DataFunTalk
Apr 26, 2023 · Artificial Intelligence

Serializing Advertising Placement with User Algorithms at Alibaba Health

Alibaba Health’s user algorithm leverages multi‑channel serialized ad placement, using vector‑based three‑tower models, knowledge distillation, and ROI‑oriented optimizations to sequence user touchpoints, improve conversion rates, and enhance model accuracy across diverse marketing channels.

ROIadvertisingknowledge distillation
0 likes · 15 min read
Serializing Advertising Placement with User Algorithms at Alibaba Health
DataFunSummit
DataFunSummit
Apr 20, 2023 · Artificial Intelligence

Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining

This presentation introduces the Mengzi lightweight model technology stack, covering large‑scale pre‑training, motivations for lightweight models, detailed techniques such as knowledge and sequence‑relation enhancement, training optimization, model compression, retrieval‑augmented pre‑training, multimodal extensions, open‑source releases, and real‑world applications.

knowledge distillationlarge language modelsmodel compression
0 likes · 23 min read
Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining