Tag

model compression

0 views collected around this technical thread.

Amap Tech
Amap Tech
May 27, 2025 · Artificial Intelligence

Gaode Map Custom Voice Pack: End‑to‑End TTS Model Architecture and Deployment

This article explains how Gaode Map leverages lightweight edge TTS models, dual‑autoregressive large‑model data augmentation, and a configurable audio‑processing DAG to enable users to create highly realistic personalized voice packs from just three recorded sentences.

Gaode MapsTTSdata augmentation
0 likes · 8 min read
Gaode Map Custom Voice Pack: End‑to‑End TTS Model Architecture and Deployment
DataFunTalk
DataFunTalk
Apr 19, 2025 · Artificial Intelligence

Microsoft Research's Open‑Source Native 1‑Bit LLM BitNet b1.58 2B4T: Design, Performance, and Deployment

Microsoft Research released BitNet b1.58 2B4T, the first open‑source native 1‑bit large language model with 2 billion parameters, 1.58‑bit effective precision and a 0.4 GB footprint, achieving full‑precision performance while enabling efficient CPU and GPU inference for edge AI applications.

1-bit quantizationCPU inferenceLLM
0 likes · 10 min read
Microsoft Research's Open‑Source Native 1‑Bit LLM BitNet b1.58 2B4T: Design, Performance, and Deployment
DeWu Technology
DeWu Technology
Apr 14, 2025 · Artificial Intelligence

Overview of Recent Large Language Model Quantization Techniques

The article surveys modern post‑training quantization approaches for large language models, detailing weight‑only and activation‑aware methods such as GPTQ, AWQ, HQQ, SmoothQuant, QuIP, QuaRot, SpinQuant, QQQ, QoQ, and FP8, and compares their precision levels, algorithmic steps, accuracy‑throughput trade‑offs, and implementation considerations for efficient inference.

AILLMmodel compression
0 likes · 32 min read
Overview of Recent Large Language Model Quantization Techniques
Tencent Cloud Developer
Tencent Cloud Developer
Mar 25, 2025 · Artificial Intelligence

Knowledge Distillation in Diffusion Models: Techniques and Applications

The article explains how knowledge distillation transfers capabilities from large to smaller diffusion models, covering hard and soft labels, temperature scaling, and contrasting it with data distillation, while detailing techniques such as consistency models, progressive distillation, adversarial distillation, and adversarial post‑training for model compression and step reduction.

adversarial post-trainingadversarial trainingconsistency models
0 likes · 19 min read
Knowledge Distillation in Diffusion Models: Techniques and Applications
JD Retail Technology
JD Retail Technology
Mar 6, 2025 · Artificial Intelligence

Dynamic Margin Selection for Efficient Deep Learning and Low-Resource Large Model Training

Jia Xing’s research introduces Dynamic Margin Selection, a technique that repeatedly refreshes a core set of boundary‑close samples to train large language models efficiently on limited resources, achieving comparable loss to full‑data training, enabling six‑fold model compression, faster inference, and a proposed exponential scaling law for data‑efficient AI.

ICLRdynamic data selectionlarge language models
0 likes · 10 min read
Dynamic Margin Selection for Efficient Deep Learning and Low-Resource Large Model Training
AntTech
AntTech
Mar 1, 2025 · Artificial Intelligence

ScaleOT: Privacy‑Utility‑Scalable Offsite‑Tuning with Dynamic LayerReplace and Selective Rank Compression

The ScaleOT framework introduces a privacy‑preserving offsite‑tuning pipeline for large language models that combines importance‑aware dynamic layer replacement with selective rank compression, enabling flexible model compression, near‑lossless fine‑tuning, and strong privacy guarantees across diverse downstream tasks.

LLMadaptermodel compression
0 likes · 16 min read
ScaleOT: Privacy‑Utility‑Scalable Offsite‑Tuning with Dynamic LayerReplace and Selective Rank Compression
Tencent Cloud Developer
Tencent Cloud Developer
Feb 27, 2025 · Artificial Intelligence

DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis

The DeepSeek technical overview details the evolution from the dense 67 B V1 model through the 236 B MoE‑based V2 and 671 B V3 with FP8 training, to the RL‑only R1 series that learns reasoning without supervision, highlighting innovations such as Grouped‑Query Attention, Multi‑Head Latent Attention, load‑balancing‑free MoE, Multi‑Token Prediction, and knowledge distillation, and reporting state‑of‑the‑art benchmark results and open‑source reproduction projects.

AI researchDeepSeekMixture of Experts
0 likes · 37 min read
DeepSeek LLM Series (V1‑V3, R1) Technical Overview and Analysis
Architecture Digest
Architecture Digest
Feb 25, 2025 · Artificial Intelligence

DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges

DeepSeek’s distillation technology combines data and model distillation to transfer knowledge from large teacher models to compact student models, detailing its definitions, principles, key innovations, architecture, training methods, performance gains, and challenges, especially in multimodal contexts.

AI researchDeepSeekknowledge distillation
0 likes · 16 min read
DeepSeek Distillation Technology: Overview, Innovations, Architecture, Training, Performance, and Challenges
Architects' Tech Alliance
Architects' Tech Alliance
Feb 12, 2025 · Artificial Intelligence

DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data

The article examines DeepSeek‑V3’s low‑cost training using 2048 H800 GPUs, explains how knowledge distillation and high‑quality data improve efficiency, discusses expert concerns about training on AI‑generated content, and outlines the limitations and ceiling effect of distillation techniques.

AI Training EfficiencyAI safetyDeepSeek-V3
0 likes · 7 min read
DeepSeek‑V3 Training Efficiency, Knowledge Distillation, and the Risks of Synthetic Data
Cognitive Technology Team
Cognitive Technology Team
Feb 7, 2025 · Artificial Intelligence

Knowledge Distillation: Concepts, Techniques, Applications, and Future Directions

This article explains knowledge distillation—a technique introduced by Geoffrey Hinton that transfers knowledge from large teacher models to compact student models—covering its core concepts, loss functions, various distillation strategies, notable applications in edge computing, federated learning, continual learning, and emerging research directions.

Edge ComputingFederated Learningcontinual learning
0 likes · 7 min read
Knowledge Distillation: Concepts, Techniques, Applications, and Future Directions
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Jan 9, 2025 · Artificial Intelligence

Unlocking Efficient Large Model Fine‑Tuning: LoRA, LoRA+, rsLoRA, DoRA & PiSSA Explained

This article introduces the fundamentals of large‑model fine‑tuning, compares popular parameter‑efficient methods such as LoRA and its variants, presents experimental results on the Qwen2.5‑7B model, and discusses current challenges and future research directions.

AI researchLoRAlarge model fine-tuning
0 likes · 17 min read
Unlocking Efficient Large Model Fine‑Tuning: LoRA, LoRA+, rsLoRA, DoRA & PiSSA Explained
JD Tech
JD Tech
Jun 23, 2024 · Artificial Intelligence

Applying Large Models to Recommendation Systems: Strategies, Challenges, and E‑commerce Case Study

This article examines how large pre‑trained models such as GPT‑4 and BERT are integrated into modern recommendation systems, detailing their advantages, implementation strategies, real‑world e‑commerce case studies, and the technical and privacy challenges that must be addressed for effective deployment.

Artificial IntelligenceLarge ModelsRecommendation systems
0 likes · 14 min read
Applying Large Models to Recommendation Systems: Strategies, Challenges, and E‑commerce Case Study
Sohu Tech Products
Sohu Tech Products
May 21, 2024 · Artificial Intelligence

OPPO Multimodal Pretrained Model Deployment in Cloud-Edge Scenarios: Practices and Optimizations

OPPO details how it deploys multimodal pretrained models on resource‑constrained edge devices by compressing CLIP‑based image‑text retrieval, adapting Chinese text‑to‑image generation with LoRA and adapters, and lightweighting diffusion models through layer pruning and progressive distillation, achieving sub‑3‑second generation while preserving cloud‑level quality.

ClipLoRAOPPO
0 likes · 18 min read
OPPO Multimodal Pretrained Model Deployment in Cloud-Edge Scenarios: Practices and Optimizations
DataFunTalk
DataFunTalk
May 20, 2024 · Artificial Intelligence

Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations

This article presents OPPO's practical research on deploying multi‑modal pre‑training models across mobile devices and cloud, covering edge image‑text retrieval, text‑image generation and understanding optimizations, and lightweight diffusion model techniques, with detailed algorithmic improvements, performance results, and real‑world application cases.

AIGCOPPOdiffusion
0 likes · 18 min read
Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations
DataFunTalk
DataFunTalk
Mar 14, 2024 · Artificial Intelligence

Efficiency Challenges and Multi‑Layer Optimization for Large AI Models

The article examines how large AI models are moving toward a unified paradigm that reduces task‑algorithm coupling, outlines multi‑layer efficiency challenges—from model compression and sparsity to software and infrastructure optimization—and highlights NVIDIA’s GTC 2024 China AI Day sessions showcasing the latest LLM technologies and registration details.

AI EfficiencyMixture of ExpertsNVIDIA GTC
0 likes · 13 min read
Efficiency Challenges and Multi‑Layer Optimization for Large AI Models
DataFunTalk
DataFunTalk
Sep 29, 2023 · Artificial Intelligence

Edge‑Cloud Collaborative Graph Neural Network Recommendation Systems: Architecture, Personalization, Model Compression, and Security

This article reviews the evolution of underlying compute power for GNN‑based recommendation systems, explores edge‑side personalization, describes cloud‑edge collaborative implementations, discusses model compression and deployment strategies, and highlights security challenges of deploying GNN models on end devices.

Edge ComputingGNNRecommendation systems
0 likes · 11 min read
Edge‑Cloud Collaborative Graph Neural Network Recommendation Systems: Architecture, Personalization, Model Compression, and Security
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Sep 22, 2023 · Artificial Intelligence

An Introduction to Knowledge Distillation for Model Compression

This article explains the AI model‑compression technique of knowledge distillation, describing how a large teacher network transfers its soft predictions to a lightweight student network using temperature‑scaled softmax, enabling deployment on resource‑constrained devices.

Artificial Intelligencedeep learningknowledge distillation
0 likes · 13 min read
An Introduction to Knowledge Distillation for Model Compression
Architecture & Thinking
Architecture & Thinking
Jun 30, 2023 · Artificial Intelligence

How INT8 Quantization Supercharges Baidu's Search Models: Techniques and Insights

This article explores the rapid evolution of Baidu's semantic search models, the large GPU consumption they entail, and how extensive INT8 quantization, sensitivity analysis, calibration data augmentation, hyper‑parameter auto‑tuning, and advanced methods like Quantization‑Aware Training and SmoothQuant dramatically improve inference performance while preserving business metrics.

ERNIEINT8 quantizationSemantic Search
0 likes · 17 min read
How INT8 Quantization Supercharges Baidu's Search Models: Techniques and Insights
Baidu Geek Talk
Baidu Geek Talk
Jun 26, 2023 · Artificial Intelligence

INT8 Quantization for Baidu Search Semantic Models (ERNIE)

Baidu applied large‑scale INT8 quantization to its ERNIE search semantic models, achieving over 25% inference speedup with less than 1% degradation in relevance metrics by selectively quantizing less‑sensitive fully‑connected layers, using automated calibration, hyper‑parameter tuning, and techniques such as QAT and SmoothQuant, while paving the way for even lower‑bit quantization and token pruning.

ERNIEINT8 quantizationQuantization Aware Training
0 likes · 15 min read
INT8 Quantization for Baidu Search Semantic Models (ERNIE)
DataFunSummit
DataFunSummit
May 25, 2023 · Artificial Intelligence

Edge‑Cloud Perspectives on Graph Neural Network‑Based Recommendation Systems

From an edge‑cloud viewpoint, this article examines the feasibility of deploying graph neural network (GNN) recommendation systems on devices, covering underlying compute evolution, personalization, edge‑cloud collaboration, model compression, deployment strategies, and security challenges, while referencing recent research advances.

AIEdge ComputingGNN
0 likes · 12 min read
Edge‑Cloud Perspectives on Graph Neural Network‑Based Recommendation Systems