Tagged articles

Model Pruning

10 articles · Page 1 of 1

May 12, 2026 · Artificial Intelligence

How to Deploy Trained Neural Networks on Arduino and Raspberry Pi

Deploying large AI models to tiny embedded devices like Arduino and Raspberry Pi requires aggressive model slimming through quantization, pruning, and distillation, careful selection of runtimes such as TensorFlow Lite, and addressing power, latency, and debugging challenges to achieve real‑time inference.

ArduinoEmbedded AIModel Pruning

0 likes · 7 min read

How to Deploy Trained Neural Networks on Arduino and Raspberry Pi

Huawei Cloud Developer Alliance

Oct 13, 2025 · Artificial Intelligence

How Quantization, Pruning, and Distillation Shrink AI Models for Edge Devices

This article explains the principles, key methods, and practical effects of model quantization, pruning, and knowledge distillation, comparing their advantages and disadvantages, and showing how combining these techniques enables compact, high‑performance AI models on resource‑constrained devices.

Model PruningModel Quantizationedge AI

0 likes · 7 min read

How Quantization, Pruning, and Distillation Shrink AI Models for Edge Devices

DataFunSummit

Sep 1, 2025 · Artificial Intelligence

How We Cut ERNIE Model Resource Use by 75% with Pruning, Structured Slimming, and ONNX Runtime

In this detailed engineering guide we diagnose a heavyweight ERNIE‑Base text‑classification service consuming 128 CPU cores and 96 GB RAM, then apply a three‑step optimization—model selection, structured pruning with PaddleSlim, and engine migration to ONNX Runtime—achieving a 75% reduction in resource usage while keeping recall above 99.5% and boosting inference speed by over 20%.

AI model optimizationModel PruningONNX Runtime

0 likes · 11 min read

How We Cut ERNIE Model Resource Use by 75% with Pruning, Structured Slimming, and ONNX Runtime

DataFunSummit

Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI safetyLarge Language ModelLoRA

0 likes · 13 min read

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

JD Tech Talk

May 27, 2025 · Artificial Intelligence

Solving Real-World AI Challenges at JD Retail: Reward Model Ensembles, Query Expansion, and Model Pruning

This article recounts how JD Retail's young algorithm engineers tackled diverse AI problems—optimizing reward‑model ensembles for ad image generation, building large‑language‑model‑based query expansion, and pruning diffusion models with FFT and RDP—while sharing their technical approaches, code snippets, and growth reflections.

AIModel Pruningalgorithm engineering

0 likes · 14 min read

Solving Real-World AI Challenges at JD Retail: Reward Model Ensembles, Query Expansion, and Model Pruning

JD Cloud Developers

May 27, 2025 · Artificial Intelligence

How JD’s Young AI Engineers Tackle Real-World Model Challenges

Young JD algorithm engineers share how they solve tough AI problems—from optimizing large‑model training and reward‑model design for ad image generation, to building LLM‑based query expansion, agent evaluation, and model pruning with FFT and RDP—illustrating practical breakthroughs and personal growth in cutting‑edge AI research.

AIModel PruningQuery Expansion

0 likes · 15 min read

How JD’s Young AI Engineers Tackle Real-World Model Challenges

Baobao Algorithm Notes

May 26, 2025 · Artificial Intelligence

When Should Large Language Models Think? 10 Cutting‑Edge Strategies to Boost Reasoning Efficiency

This article reviews ten recent papers that tackle the over‑thinking problem in large language models by shortening chain‑of‑thought reasoning, introducing dynamic early‑exit, adaptive thinking triggers, and reinforcement‑learning‑based training, showing how models can maintain or improve accuracy while dramatically reducing token usage and latency.

AI researchModel Pruningadaptive inference

0 likes · 38 min read

When Should Large Language Models Think? 10 Cutting‑Edge Strategies to Boost Reasoning Efficiency

JD Retail Technology

May 7, 2025 · Artificial Intelligence

Solving Technical Challenges with Large AI Models at JD Retail: Reward Modeling, Query Expansion, and Model Pruning

JD Retail’s engineering team tackles hard AI problems by replacing a monolithic reward model with specialized small models for ad‑image generation, deploying an LLM‑driven query‑expansion pipeline that lifts conversion rates, and pruning text‑to‑image transformers using FFT and RDP to boost throughput 40% without loss, while building comprehensive evaluation tools and a semantic smart‑assistant.

AIModel PruningQuery Expansion

0 likes · 14 min read

Solving Technical Challenges with Large AI Models at JD Retail: Reward Modeling, Query Expansion, and Model Pruning

Kuaishou Tech

Oct 26, 2023 · Artificial Intelligence

SHARK: Efficient Embedding Compression for Large-Scale Recommendation Models

The paper introduces SHARK, a two‑component framework that uses a fast Taylor‑expanded permutation method to prune embedding tables and a frequency‑aware quantization scheme to apply mixed‑precision to embeddings, achieving up to 70% memory reduction and 30% QPS improvement in industrial short‑video and e‑commerce recommendation systems.

EfficiencyModel PruningQuantization

0 likes · 8 min read

SHARK: Efficient Embedding Compression for Large-Scale Recommendation Models

DataFunSummit

May 29, 2023 · Artificial Intelligence

Neuron‑level Shared Multi‑task Learning for Joint CTR and CVR Prediction

This article introduces a neuron‑level shared multi‑task learning framework that jointly estimates click‑through rate (CTR) and conversion rate (CVR), discusses the background and advantages of multi‑task learning, reviews classic shared‑bottom models, describes the proposed pruning‑based architecture, and presents experimental results demonstrating its effectiveness in large‑scale recommendation systems.

CTRCVRModel Pruning

0 likes · 11 min read

Neuron‑level Shared Multi‑task Learning for Joint CTR and CVR Prediction