Tagged articles

AI efficiency

18 articles · Page 1 of 1

May 1, 2026 · Artificial Intelligence

How DeepSeek V4 Triggers a Global AI Price War with OpenAI

DeepSeek V4’s open‑source 1 M‑token MoE model delivers benchmark scores of MMLU 88.7, C‑Eval 92.1 and HumanEval 69.5, while its 4‑bit AWQ quantization, PagedAttention memory management and FlashAttention acceleration cut inference costs and latency, prompting rivals such as Anthropic, OpenAI, Baidu and Huawei to slash prices and boost efficiency in a fierce market battle.

AI efficiencyDeepSeek-V4Large Language Model

0 likes · 9 min read

How DeepSeek V4 Triggers a Global AI Price War with OpenAI

AI Code to Success

Mar 27, 2026 · Artificial Intelligence

How Google’s TurboQuant Cuts LLM Memory by 6× and Speeds Up Inference 8×

Google Research’s TurboQuant algorithm compresses large‑language‑model KV caches from 32‑bit to 3‑bit, achieving a six‑fold reduction in memory usage and an eight‑fold inference speedup on H100 GPUs while preserving 100 % accuracy, and it also improves vector search performance without requiring large codebooks.

AI efficiencyLLM compressionTurboQuant

0 likes · 10 min read

How Google’s TurboQuant Cuts LLM Memory by 6× and Speeds Up Inference 8×

Digital Planet

Mar 26, 2026 · Industry Insights

The 5 Fatal Mistakes That Sabotage AI Efficiency Projects (And How to Avoid Them)

Enterprises seeking AI‑driven efficiency often stumble into five common traps—poor selection, perfectionism, over‑control, fighting AI in its strong suits, and unvalidated delivery—each dramatically cutting ROI unless a disciplined, human‑centric process is applied across the AI lifecycle.

AI adoptionAI efficiencyAI pitfalls

0 likes · 15 min read

The 5 Fatal Mistakes That Sabotage AI Efficiency Projects (And How to Avoid Them)

AsiaInfo Technology: New Tech Exploration

Dec 30, 2025 · Artificial Intelligence

How Dataset Distillation Shrinks Training Data Without Losing Accuracy

This article provides a comprehensive review of dataset distillation, explaining its motivation, core concepts, major algorithmic families, evaluation criteria, and practical applications such as continual learning, federated learning, neural architecture search, and privacy‑preserving AI.

AI efficiencyDataset DistillationDistribution Matching

0 likes · 25 min read

How Dataset Distillation Shrinks Training Data Without Losing Accuracy

Tencent Technical Engineering

Oct 31, 2025 · Artificial Intelligence

How SpecExit Cuts LLM Reasoning Chains by 66% and Boosts Inference Speed 2.5×

SpecExit combines speculative sampling with a lightweight draft model to predict early‑exit signals, shortening large‑reasoning model chains by up to two‑thirds and achieving up to 2.5× end‑to‑end inference acceleration on vLLM without sacrificing accuracy.

AI efficiencyEarly StoppingInference Optimization

0 likes · 12 min read

How SpecExit Cuts LLM Reasoning Chains by 66% and Boosts Inference Speed 2.5×

Shopee Tech Team

Oct 14, 2025 · Artificial Intelligence

How SPEC‑RL Boosts On‑Policy Reinforcement Learning Speed by Up to 3×

SPEC‑RL introduces speculative rollouts that reuse verified historical rollouts as prefixes, cutting rollout time by 2–3× while maintaining or improving performance across various math and reasoning benchmarks, and works seamlessly with PPO, GRPO, DAPO and other on‑policy algorithms.

AI efficiencyLarge Language ModelsTraining Acceleration

0 likes · 8 min read

How SPEC‑RL Boosts On‑Policy Reinforcement Learning Speed by Up to 3×

Data Party THU

Aug 10, 2025 · Artificial Intelligence

Can LLMs Predict Multiple Tokens at Once? A Deep Dive into Multi‑Token Generation

This article evaluates whether autoregressive large language models can generate several tokens in a single inference step, describing a mask‑based multi‑token prediction framework, gated LoRA adaptation, experimental results on Tulu‑3‑8B showing up to 5.2× speedup, and discusses implications for future research.

AI efficiencyLLMMulti-token generation

0 likes · 13 min read

Can LLMs Predict Multiple Tokens at Once? A Deep Dive into Multi‑Token Generation

JD Retail Technology

May 19, 2025 · Artificial Intelligence

How JD’s Omniforce Boosts Large Model Efficiency with Cloud‑Edge Collaboration

The JD Exploration Institute paper introduces Omniforce, a human‑centered, cloud‑edge collaborative AutoML system that uses model distillation, dynamic data governance, Bayesian‑optimized training, and edge deployment to cut large‑model training costs by 70% and improve inference speed by 30%, powering the JoyBuild platform for broader AI adoption.

AI efficiencyAutoMLJoyBuild

0 likes · 6 min read

How JD’s Omniforce Boosts Large Model Efficiency with Cloud‑Edge Collaboration

JD Tech

May 15, 2025 · Artificial Intelligence

How JD’s Omniforce Cuts Large‑Model Training Cost by 70% and Boosts Inference Speed 30%

The paper "Omniforce" from JD Exploration Research Institute presents a cloud‑edge collaborative AutoML system that uses model distillation, data governance, Bayesian training optimization, and cloud‑edge cooperation to reduce large‑model training costs by 70% and improve inference efficiency by an average of 30%, offering a reusable technical paradigm for scalable AI deployment.

AI efficiencyJoyBuildTraining Optimization

0 likes · 6 min read

How JD’s Omniforce Cuts Large‑Model Training Cost by 70% and Boosts Inference Speed 30%

Architects' Tech Alliance

Mar 28, 2025 · Artificial Intelligence

How DeepSeek Leverages Huawei Ascend to Boost AI Inference Efficiency

The report analyzes DeepSeek's latest V3 and R1 models, highlights their scaling‑law‑driven cost reductions, explains how Huawei Ascend optimizes inference by cutting KV‑Cache storage and improving compute efficiency, and surveys the model’s deployments across finance, government, manufacturing, and healthcare sectors.

AI efficiencyAI inferenceDeepSeek

0 likes · 4 min read

How DeepSeek Leverages Huawei Ascend to Boost AI Inference Efficiency

Architect's Alchemy Furnace

Feb 19, 2025 · Artificial Intelligence

How DeepSeek Beats GPT-4 with 10× Less Compute: Inside the AI Efficiency Revolution

This article examines DeepSeek's breakthrough AI techniques—including a revamped MoE architecture, aggressive data distillation, ultra‑low‑energy training, novel multi‑stage training strategies, and custom AI chips—that enable a 7B model to rival GPT‑4 while consuming a fraction of the resources.

AI efficiencyData distillationDeepSeek

0 likes · 9 min read

How DeepSeek Beats GPT-4 with 10× Less Compute: Inside the AI Efficiency Revolution

Open Source Linux

Feb 10, 2025 · Artificial Intelligence

How DeepSeek R1 Uses Large‑Scale Reinforcement Learning to Replicate OpenAI o1

This article examines DeepSeek R1’s large‑scale reinforcement‑learning approach, its training pipeline that combines rule‑based scaling and deep‑reasoning SFT data, and why its open‑source, low‑cost replication of OpenAI o1 marks a pivotal step toward more efficient, democratized AI models.

AI efficiencyDeepSeekLarge Language Models

0 likes · 18 min read

How DeepSeek R1 Uses Large‑Scale Reinforcement Learning to Replicate OpenAI o1

Architect

Feb 9, 2025 · Artificial Intelligence

How DeepSeek’s Model Distillation Boosts AI Efficiency and Performance

This article provides an in‑depth analysis of DeepSeek’s model distillation technology, covering its definition, core principles, innovative strategies, architecture design, training optimizations, benchmark results, efficiency gains, and the remaining challenges of applying distillation to large language models and multimodal data.

AI efficiencyDeepSeekKnowledge Transfer

0 likes · 16 min read

How DeepSeek’s Model Distillation Boosts AI Efficiency and Performance

AntTech

Jun 12, 2024 · Artificial Intelligence

Introducing C-Poly: A Multi‑Task Learning Paradigm for More Efficient Large‑Model Training

The article introduces the ICLR‑2024 paper C‑Poly, a multi‑task learning framework that boosts large‑model efficiency and resource utilization, aiming to make powerful AI models as accessible and convenient as everyday services like QR‑code payments.

AI efficiencyC-PolyICLR2024

0 likes · 2 min read

Introducing C-Poly: A Multi‑Task Learning Paradigm for More Efficient Large‑Model Training

Architect

May 5, 2024 · Artificial Intelligence

The Rise of Small Language Models (SLM) and Their Impact on AI Development

Amidst a growing trend that narrows performance gaps between large and small language models, researchers highlight the efficiency, adaptability, and specialized advantages of small language models (SLM), while also discussing the high costs, hallucinations, and security concerns that still challenge large‑scale LLMs.

AI efficiencyLLMModel Scaling

0 likes · 9 min read

The Rise of Small Language Models (SLM) and Their Impact on AI Development

DataFunTalk

Mar 14, 2024 · Artificial Intelligence

Efficiency Challenges and Multi‑Layer Optimization for Large AI Models

The article examines how large AI models are moving toward a unified paradigm that reduces task‑algorithm coupling, outlines multi‑layer efficiency challenges—from model compression and sparsity to software and infrastructure optimization—and highlights NVIDIA’s GTC 2024 China AI Day sessions showcasing the latest LLM technologies and registration details.

AI efficiencyMixture of ExpertsNVIDIA GTC

0 likes · 13 min read

Efficiency Challenges and Multi‑Layer Optimization for Large AI Models

Alibaba Cloud Big Data AI Platform

Jan 10, 2023 · Artificial Intelligence

How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights

This article examines the use of Mixture‑of‑Experts (MoE) sparse training for GPT models, detailing the architecture, training and inference efficiency gains, experimental comparisons with dense models, custom routing algorithms, and step‑by‑step deployment on Alibaba Cloud AI platforms.

AI efficiencyGPT-MoELarge Language Models

0 likes · 26 min read

How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights

NetEase Smart Enterprise Tech+

Dec 14, 2022 · Artificial Intelligence

Boosting AI Efficiency in Digital Content Risk Control: Insights from QCon

In this interview, NetEase AI expert Li Yuke shares how lightweight, cost‑effective AI solutions improve digital content risk control, audio‑video processing, and conversational systems, while discussing technical committees, data standards, and future AI trends such as multimodal and unsupervised learning.

AI efficiencyAI productionMultimodal AI

0 likes · 11 min read

Boosting AI Efficiency in Digital Content Risk Control: Insights from QCon