Tag

pruning

0 views collected around this technical thread.

Kuaishou Tech
Kuaishou Tech
Jan 24, 2025 · Artificial Intelligence

KwaiCoder-23BA4-v1: An Efficient Large Code Generation Model via Pruning, Knowledge Distillation, and Granular Upcycling

KwaiCoder-23BA4-v1 is a 23B wide MoE code‑completion model that achieves state‑of‑the‑art performance on HumanEval, BigCodeBench and Fill‑in‑Middle benchmarks by using high‑quality data, a cost‑effective training pipeline that combines model pruning, knowledge distillation and fine‑grained merging, and extensive ablation studies.

AIbenchmarkcode generation
0 likes · 10 min read
KwaiCoder-23BA4-v1: An Efficient Large Code Generation Model via Pruning, Knowledge Distillation, and Granular Upcycling
Java Tech Enthusiast
Java Tech Enthusiast
Dec 12, 2024 · Fundamentals

LeetCode 814: Binary Tree Pruning

The article explains LeetCode 814, where a binary tree of 0s and 1s is pruned by recursively removing subtrees lacking a 1, using a post‑order traversal that returns null for nodes with value 0 and no retained children, achieving O(n) time and O(h) space.

Binary TreeC++Java
0 likes · 6 min read
LeetCode 814: Binary Tree Pruning
Baidu Geek Talk
Baidu Geek Talk
Nov 9, 2023 · Artificial Intelligence

Deep Learning Model Architecture Evolution in Baidu Search

The article chronicles Baidu Search’s Model Architecture Group’s evolution of deep‑learning‑driven search, detailing the shift from inverted‑index to semantic vector indexing, the use of transformer‑based models for text and image queries, large‑scale offline/online pipelines, and extensive GPU‑centric optimizations such as pruning, quantization and distillation, all aimed at delivering precise, cost‑effective results to hundreds of millions of users.

ERNIEGPU inferenceModel Distillation
0 likes · 14 min read
Deep Learning Model Architecture Evolution in Baidu Search
DataFunTalk
DataFunTalk
Apr 22, 2022 · Artificial Intelligence

Inference Optimization Techniques and GPU Parallel Acceleration for Tencent Intelligent Dialogue Models

This article presents a comprehensive overview of inference optimization methods—including model pruning, quantization, knowledge distillation, caching, instruction‑set acceleration, and operator fusion—and details a GPU‑centric parallel acceleration methodology with CUDA basics, performance‑analysis tools, theoretical limits, and practical case studies, all illustrated with real‑world examples from Tencent's intelligent dialogue products.

GPU AccelerationPerformance Profilingcaching
0 likes · 18 min read
Inference Optimization Techniques and GPU Parallel Acceleration for Tencent Intelligent Dialogue Models
DataFunSummit
DataFunSummit
Jan 29, 2022 · Artificial Intelligence

Survey of Model Pruning and Quantization Techniques for Deep Learning

This article provides a comprehensive overview of recent advances in deep learning model compression, focusing on pruning methods—including unstructured, structured, filter-wise, channel-wise, shape-wise, and stripe-wise approaches—and quantization techniques such as linear, non‑linear, clustering, power‑of‑two, binary, and 8‑bit quantization, while discussing evaluation criteria, sparsity ratios, fine‑tuning, and training‑aware quantization.

deep learningmodel compressionneural networks
0 likes · 23 min read
Survey of Model Pruning and Quantization Techniques for Deep Learning
Laiye Technology Team
Laiye Technology Team
Jan 28, 2022 · Artificial Intelligence

Survey of Model Compression and Quantization Techniques for Deep Neural Networks

This article provides a comprehensive overview of deep learning model compression and acceleration methods, detailing pruning strategies, various pruning types, evaluation criteria, sparsity ratios, fine‑tuning procedures, as well as linear and non‑linear quantization approaches, their implementations, and practical considerations.

Efficiencydeep learningmodel compression
0 likes · 26 min read
Survey of Model Compression and Quantization Techniques for Deep Neural Networks
DataFunSummit
DataFunSummit
Jun 5, 2021 · Artificial Intelligence

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure‑Preserving Methods

This article reviews BERT’s architecture, analyzes the storage and compute costs of each layer, and systematically presents compression methods—including quantization, pruning, knowledge distillation (Distilled BiLSTM and MobileBERT), and structure‑preserving techniques—aimed at enabling efficient deployment on resource‑constrained mobile devices.

BERTknowledge distillationmobile deployment
0 likes · 15 min read
Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure‑Preserving Methods
DataFunTalk
DataFunTalk
Jun 3, 2021 · Artificial Intelligence

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure-Preserving Methods

This article examines the internal structure of BERT and systematically presents various model‑compression strategies—including quantization, pruning, knowledge distillation, and structure‑preserving techniques—highlighting their impact on storage, computational cost, and inference speed for deployment on resource‑constrained mobile devices.

BERTknowledge distillationmobile AI
0 likes · 16 min read
Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure-Preserving Methods
Kuaishou Tech
Kuaishou Tech
Mar 18, 2021 · Artificial Intelligence

Hammer: An Integrated Hardware-Aware Model Compression Framework

Hammer is an integrated hardware-aware model compression tool developed by Kuaishou in collaboration with universities, combining pruning, quantization, search, and distillation to achieve efficient and accurate neural network models tailored to specific hardware.

AI FrameworkKuaishouNAS
0 likes · 9 min read
Hammer: An Integrated Hardware-Aware Model Compression Framework
Sohu Tech Products
Sohu Tech Products
Jan 6, 2021 · Artificial Intelligence

Overview of Main Model Compression and Acceleration Techniques: Structural Optimization, Pruning, Quantization, and Knowledge Distillation

This article reviews four mainstream model compression and acceleration methods—structural optimization, pruning, quantization, and knowledge distillation—explaining their principles, implementations, and performance, and presents practical examples such as DistillBERT, TinyBERT, and FastBERT with comparative results.

AIdeep learningknowledge distillation
0 likes · 14 min read
Overview of Main Model Compression and Acceleration Techniques: Structural Optimization, Pruning, Quantization, and Knowledge Distillation
Didi Tech
Didi Tech
Oct 21, 2020 · Artificial Intelligence

Deep Model Compression Techniques for Intelligent Automotive Cockpits

The article reviews deep‑model compression methods—ADMM‑based structured pruning, low‑bit quantization, and teacher‑student knowledge distillation—and their automated AutoCompress workflow, demonstrating how these techniques shrink neural networks enough to run real‑time driver‑monitoring and other intelligent cockpit functions on resource‑limited automotive hardware while preserving accuracy.

ADMMdeep learningedge AI
0 likes · 16 min read
Deep Model Compression Techniques for Intelligent Automotive Cockpits
AntTech
AntTech
Jun 9, 2020 · Artificial Intelligence

Deep Learning Model Compression and Acceleration Techniques for Mobile AI

This article reviews the motivations, challenges, and a comprehensive set of algorithmic, framework, and hardware methods—including structural optimization, quantization, pruning, and knowledge distillation—to compress and accelerate deep learning models for deployment on mobile devices, highlighting benefits such as reduced server load, lower latency, improved reliability, and enhanced privacy.

knowledge distillationmobile AImodel compression
0 likes · 17 min read
Deep Learning Model Compression and Acceleration Techniques for Mobile AI
Tencent Tech
Tencent Tech
Feb 27, 2020 · Artificial Intelligence

How to Speed Up Deep Learning Models: Cutting-Edge Acceleration Techniques

Deep learning models often suffer from slow training and deployment due to their size, but a range of advanced acceleration methods—including model architecture optimization, pruning, quantization, knowledge distillation, and distributed training techniques—can dramatically improve speed and efficiency while maintaining performance.

deep learningdistributed trainingknowledge distillation
0 likes · 14 min read
How to Speed Up Deep Learning Models: Cutting-Edge Acceleration Techniques
DataFunTalk
DataFunTalk
Dec 19, 2019 · Artificial Intelligence

Model Quantization in Neural Networks: Challenges, Solutions, and Future Directions

This article reviews neural‑network model quantization, explaining why quantization is needed, detailing forward‑ and backward‑propagation issues, presenting three main mitigation strategies, discussing subsequent pruning, performance‑recovery techniques, and outlining future research avenues in efficient machine learning.

Model Quantizationefficient machine learninghardware acceleration
0 likes · 27 min read
Model Quantization in Neural Networks: Challenges, Solutions, and Future Directions
Tencent Cloud Developer
Tencent Cloud Developer
Mar 19, 2018 · Artificial Intelligence

Basic Concepts of Decision Trees

Decision trees are tree-structured classifiers that split data using attributes chosen for maximal purity measured by Gini impurity or entropy, with algorithms like ID3 selecting splits by information gain, while overfitting is mitigated through constraints and pruning techniques such as REP, PEP, and CCP.

Gini ImpurityID3algorithm
0 likes · 13 min read
Basic Concepts of Decision Trees
Qunar Tech Salon
Qunar Tech Salon
Apr 5, 2015 · Fundamentals

Backtracking Algorithm: Concepts, Core Ideas, General Steps, and Frameworks

This article explains the backtracking algorithm as an enumeration‑like depth‑first search technique, outlines its fundamental concepts, basic ideas, typical problem‑solving steps, and provides both non‑recursive and recursive pseudo‑code frameworks for implementation.

algorithmbacktrackingdepth-first-search
0 likes · 4 min read
Backtracking Algorithm: Concepts, Core Ideas, General Steps, and Frameworks