Tagged articles
10 articles
Page 1 of 1
Liangxu Linux
Liangxu Linux
May 12, 2026 · Artificial Intelligence

How to Deploy Trained Neural Networks on Arduino and Raspberry Pi

Deploying large AI models to tiny embedded devices like Arduino and Raspberry Pi requires aggressive model slimming through quantization, pruning, and distillation, careful selection of runtimes such as TensorFlow Lite, and addressing power, latency, and debugging challenges to achieve real‑time inference.

ArduinoEmbedded AIModel Pruning
0 likes · 7 min read
How to Deploy Trained Neural Networks on Arduino and Raspberry Pi
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Oct 13, 2025 · Artificial Intelligence

How Quantization, Pruning, and Distillation Shrink AI Models for Edge Devices

This article explains the principles, key methods, and practical effects of model quantization, pruning, and knowledge distillation, comparing their advantages and disadvantages, and showing how combining these techniques enables compact, high‑performance AI models on resource‑constrained devices.

Model PruningModel Quantizationedge AI
0 likes · 7 min read
How Quantization, Pruning, and Distillation Shrink AI Models for Edge Devices
DataFunSummit
DataFunSummit
Sep 1, 2025 · Artificial Intelligence

How We Cut ERNIE Model Resource Use by 75% with Pruning, Structured Slimming, and ONNX Runtime

In this detailed engineering guide we diagnose a heavyweight ERNIE‑Base text‑classification service consuming 128 CPU cores and 96 GB RAM, then apply a three‑step optimization—model selection, structured pruning with PaddleSlim, and engine migration to ONNX Runtime—achieving a 75% reduction in resource usage while keeping recall above 99.5% and boosting inference speed by over 20%.

AI model optimizationModel PruningONNX Runtime
0 likes · 11 min read
How We Cut ERNIE Model Resource Use by 75% with Pruning, Structured Slimming, and ONNX Runtime
DataFunSummit
DataFunSummit
Jun 10, 2025 · Artificial Intelligence

How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety

Quwan Technology presents its Kaitian social large model, designed for personalized, emotionally rich, multimodal AI interactions, detailing its scene‑specific goals, CPT+SFT+RLHF training pipeline, data desensitization, LoRA fine‑tuning, evaluation methods, pruning, latency trade‑offs, safety mechanisms, and future feedback loops.

AI SafetyLoRAModel Pruning
0 likes · 13 min read
How Quwan’s Kaitian Model Tackles Emotional AI for Social Apps – Architecture, Training Tricks, and Safety
JD Tech Talk
JD Tech Talk
May 27, 2025 · Artificial Intelligence

Solving Real-World AI Challenges at JD Retail: Reward Model Ensembles, Query Expansion, and Model Pruning

This article recounts how JD Retail's young algorithm engineers tackled diverse AI problems—optimizing reward‑model ensembles for ad image generation, building large‑language‑model‑based query expansion, and pruning diffusion models with FFT and RDP—while sharing their technical approaches, code snippets, and growth reflections.

AIModel Pruningalgorithm engineering
0 likes · 14 min read
Solving Real-World AI Challenges at JD Retail: Reward Model Ensembles, Query Expansion, and Model Pruning
JD Cloud Developers
JD Cloud Developers
May 27, 2025 · Artificial Intelligence

How JD’s Young AI Engineers Tackle Real-World Model Challenges

Young JD algorithm engineers share how they solve tough AI problems—from optimizing large‑model training and reward‑model design for ad image generation, to building LLM‑based query expansion, agent evaluation, and model pruning with FFT and RDP—illustrating practical breakthroughs and personal growth in cutting‑edge AI research.

AIModel PruningReward Modeling
0 likes · 15 min read
How JD’s Young AI Engineers Tackle Real-World Model Challenges
Baobao Algorithm Notes
Baobao Algorithm Notes
May 26, 2025 · Artificial Intelligence

When Should Large Language Models Think? 10 Cutting‑Edge Strategies to Boost Reasoning Efficiency

This article reviews ten recent papers that tackle the over‑thinking problem in large language models by shortening chain‑of‑thought reasoning, introducing dynamic early‑exit, adaptive thinking triggers, and reinforcement‑learning‑based training, showing how models can maintain or improve accuracy while dramatically reducing token usage and latency.

AI researchModel Pruningadaptive inference
0 likes · 38 min read
When Should Large Language Models Think? 10 Cutting‑Edge Strategies to Boost Reasoning Efficiency
JD Retail Technology
JD Retail Technology
May 7, 2025 · Artificial Intelligence

Solving Technical Challenges with Large AI Models at JD Retail: Reward Modeling, Query Expansion, and Model Pruning

JD Retail’s engineering team tackles hard AI problems by replacing a monolithic reward model with specialized small models for ad‑image generation, deploying an LLM‑driven query‑expansion pipeline that lifts conversion rates, and pruning text‑to‑image transformers using FFT and RDP to boost throughput 40% without loss, while building comprehensive evaluation tools and a semantic smart‑assistant.

AIModel PruningReward Modeling
0 likes · 14 min read
Solving Technical Challenges with Large AI Models at JD Retail: Reward Modeling, Query Expansion, and Model Pruning
Kuaishou Tech
Kuaishou Tech
Oct 26, 2023 · Artificial Intelligence

SHARK: Efficient Embedding Compression for Large-Scale Recommendation Models

The paper introduces SHARK, a two‑component framework that uses a fast Taylor‑expanded permutation method to prune embedding tables and a frequency‑aware quantization scheme to apply mixed‑precision to embeddings, achieving up to 70% memory reduction and 30% QPS improvement in industrial short‑video and e‑commerce recommendation systems.

Model Pruningefficiencyembedding compression
0 likes · 8 min read
SHARK: Efficient Embedding Compression for Large-Scale Recommendation Models
DataFunSummit
DataFunSummit
May 29, 2023 · Artificial Intelligence

Neuron‑level Shared Multi‑task Learning for Joint CTR and CVR Prediction

This article introduces a neuron‑level shared multi‑task learning framework that jointly estimates click‑through rate (CTR) and conversion rate (CVR), discusses the background and advantages of multi‑task learning, reviews classic shared‑bottom models, describes the proposed pruning‑based architecture, and presents experimental results demonstrating its effectiveness in large‑scale recommendation systems.

CTRCVRModel Pruning
0 likes · 11 min read
Neuron‑level Shared Multi‑task Learning for Joint CTR and CVR Prediction