Tagged articles
18 articles
Page 1 of 1
Architects' Tech Alliance
Architects' Tech Alliance
May 1, 2026 · Artificial Intelligence

How DeepSeek V4 Triggers a Global AI Price War with OpenAI

DeepSeek V4’s open‑source 1 M‑token MoE model delivers benchmark scores of MMLU 88.7, C‑Eval 92.1 and HumanEval 69.5, while its 4‑bit AWQ quantization, PagedAttention memory management and FlashAttention acceleration cut inference costs and latency, prompting rivals such as Anthropic, OpenAI, Baidu and Huawei to slash prices and boost efficiency in a fierce market battle.

AI efficiencyDeepSeek-V4MoE
0 likes · 9 min read
How DeepSeek V4 Triggers a Global AI Price War with OpenAI
AI Code to Success
AI Code to Success
Mar 27, 2026 · Artificial Intelligence

How Google’s TurboQuant Cuts LLM Memory by 6× and Speeds Up Inference 8×

Google Research’s TurboQuant algorithm compresses large‑language‑model KV caches from 32‑bit to 3‑bit, achieving a six‑fold reduction in memory usage and an eight‑fold inference speedup on H100 GPUs while preserving 100 % accuracy, and it also improves vector search performance without requiring large codebooks.

AI efficiencyInference AccelerationLLM compression
0 likes · 10 min read
How Google’s TurboQuant Cuts LLM Memory by 6× and Speeds Up Inference 8×
Digital Planet
Digital Planet
Mar 26, 2026 · Industry Insights

The 5 Fatal Mistakes That Sabotage AI Efficiency Projects (And How to Avoid Them)

Enterprises seeking AI‑driven efficiency often stumble into five common traps—poor selection, perfectionism, over‑control, fighting AI in its strong suits, and unvalidated delivery—each dramatically cutting ROI unless a disciplined, human‑centric process is applied across the AI lifecycle.

AI adoptionAI efficiencyAI pitfalls
0 likes · 15 min read
The 5 Fatal Mistakes That Sabotage AI Efficiency Projects (And How to Avoid Them)
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Dec 30, 2025 · Artificial Intelligence

How Dataset Distillation Shrinks Training Data Without Losing Accuracy

This article provides a comprehensive review of dataset distillation, explaining its motivation, core concepts, major algorithmic families, evaluation criteria, and practical applications such as continual learning, federated learning, neural architecture search, and privacy‑preserving AI.

AI efficiencyDataset DistillationDistribution Matching
0 likes · 25 min read
How Dataset Distillation Shrinks Training Data Without Losing Accuracy
Shopee Tech Team
Shopee Tech Team
Oct 14, 2025 · Artificial Intelligence

How SPEC‑RL Boosts On‑Policy Reinforcement Learning Speed by Up to 3×

SPEC‑RL introduces speculative rollouts that reuse verified historical rollouts as prefixes, cutting rollout time by 2–3× while maintaining or improving performance across various math and reasoning benchmarks, and works seamlessly with PPO, GRPO, DAPO and other on‑policy algorithms.

AI efficiencyTraining Accelerationlarge language models
0 likes · 8 min read
How SPEC‑RL Boosts On‑Policy Reinforcement Learning Speed by Up to 3×
Data Party THU
Data Party THU
Aug 10, 2025 · Artificial Intelligence

Can LLMs Predict Multiple Tokens at Once? A Deep Dive into Multi‑Token Generation

This article evaluates whether autoregressive large language models can generate several tokens in a single inference step, describing a mask‑based multi‑token prediction framework, gated LoRA adaptation, experimental results on Tulu‑3‑8B showing up to 5.2× speedup, and discusses implications for future research.

AI efficiencyLLMMulti-token generation
0 likes · 13 min read
Can LLMs Predict Multiple Tokens at Once? A Deep Dive into Multi‑Token Generation
JD Retail Technology
JD Retail Technology
May 19, 2025 · Artificial Intelligence

How JD’s Omniforce Boosts Large Model Efficiency with Cloud‑Edge Collaboration

The JD Exploration Institute paper introduces Omniforce, a human‑centered, cloud‑edge collaborative AutoML system that uses model distillation, dynamic data governance, Bayesian‑optimized training, and edge deployment to cut large‑model training costs by 70% and improve inference speed by 30%, powering the JoyBuild platform for broader AI adoption.

AI efficiencyAutoMLJoyBuild
0 likes · 6 min read
How JD’s Omniforce Boosts Large Model Efficiency with Cloud‑Edge Collaboration
JD Tech
JD Tech
May 15, 2025 · Artificial Intelligence

How JD’s Omniforce Cuts Large‑Model Training Cost by 70% and Boosts Inference Speed 30%

The paper "Omniforce" from JD Exploration Research Institute presents a cloud‑edge collaborative AutoML system that uses model distillation, data governance, Bayesian training optimization, and cloud‑edge cooperation to reduce large‑model training costs by 70% and improve inference efficiency by an average of 30%, offering a reusable technical paradigm for scalable AI deployment.

AI efficiencyJoyBuildLarge Model
0 likes · 6 min read
How JD’s Omniforce Cuts Large‑Model Training Cost by 70% and Boosts Inference Speed 30%
Architects' Tech Alliance
Architects' Tech Alliance
Mar 28, 2025 · Artificial Intelligence

How DeepSeek Leverages Huawei Ascend to Boost AI Inference Efficiency

The report analyzes DeepSeek's latest V3 and R1 models, highlights their scaling‑law‑driven cost reductions, explains how Huawei Ascend optimizes inference by cutting KV‑Cache storage and improving compute efficiency, and surveys the model’s deployments across finance, government, manufacturing, and healthcare sectors.

AI efficiencyAI inferenceDeepSeek
0 likes · 4 min read
How DeepSeek Leverages Huawei Ascend to Boost AI Inference Efficiency
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Feb 19, 2025 · Artificial Intelligence

How DeepSeek Beats GPT-4 with 10× Less Compute: Inside the AI Efficiency Revolution

This article examines DeepSeek's breakthrough AI techniques—including a revamped MoE architecture, aggressive data distillation, ultra‑low‑energy training, novel multi‑stage training strategies, and custom AI chips—that enable a 7B model to rival GPT‑4 while consuming a fraction of the resources.

AI efficiencyData distillationDeepSeek
0 likes · 9 min read
How DeepSeek Beats GPT-4 with 10× Less Compute: Inside the AI Efficiency Revolution
Open Source Linux
Open Source Linux
Feb 10, 2025 · Artificial Intelligence

How DeepSeek R1 Uses Large‑Scale Reinforcement Learning to Replicate OpenAI o1

This article examines DeepSeek R1’s large‑scale reinforcement‑learning approach, its training pipeline that combines rule‑based scaling and deep‑reasoning SFT data, and why its open‑source, low‑cost replication of OpenAI o1 marks a pivotal step toward more efficient, democratized AI models.

AI efficiencyDeepSeekModel Scaling
0 likes · 18 min read
How DeepSeek R1 Uses Large‑Scale Reinforcement Learning to Replicate OpenAI o1
Architect
Architect
Feb 9, 2025 · Artificial Intelligence

How DeepSeek’s Model Distillation Boosts AI Efficiency and Performance

This article provides an in‑depth analysis of DeepSeek’s model distillation technology, covering its definition, core principles, innovative strategies, architecture design, training optimizations, benchmark results, efficiency gains, and the remaining challenges of applying distillation to large language models and multimodal data.

AI efficiencyDeepSeekKnowledge Transfer
0 likes · 16 min read
How DeepSeek’s Model Distillation Boosts AI Efficiency and Performance
Architect
Architect
May 5, 2024 · Artificial Intelligence

The Rise of Small Language Models (SLM) and Their Impact on AI Development

Amidst a growing trend that narrows performance gaps between large and small language models, researchers highlight the efficiency, adaptability, and specialized advantages of small language models (SLM), while also discussing the high costs, hallucinations, and security concerns that still challenge large‑scale LLMs.

AI efficiencyEdge ComputingLLM
0 likes · 9 min read
The Rise of Small Language Models (SLM) and Their Impact on AI Development
DataFunTalk
DataFunTalk
Mar 14, 2024 · Artificial Intelligence

Efficiency Challenges and Multi‑Layer Optimization for Large AI Models

The article examines how large AI models are moving toward a unified paradigm that reduces task‑algorithm coupling, outlines multi‑layer efficiency challenges—from model compression and sparsity to software and infrastructure optimization—and highlights NVIDIA’s GTC 2024 China AI Day sessions showcasing the latest LLM technologies and registration details.

AI efficiencyMixture of ExpertsNVIDIA GTC
0 likes · 13 min read
Efficiency Challenges and Multi‑Layer Optimization for Large AI Models
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 10, 2023 · Artificial Intelligence

How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights

This article examines the use of Mixture‑of‑Experts (MoE) sparse training for GPT models, detailing the architecture, training and inference efficiency gains, experimental comparisons with dense models, custom routing algorithms, and step‑by‑step deployment on Alibaba Cloud AI platforms.

AI efficiencyGPT-MoEModel Training
0 likes · 26 min read
How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Dec 14, 2022 · Artificial Intelligence

Boosting AI Efficiency in Digital Content Risk Control: Insights from QCon

In this interview, NetEase AI expert Li Yuke shares how lightweight, cost‑effective AI solutions improve digital content risk control, audio‑video processing, and conversational systems, while discussing technical committees, data standards, and future AI trends such as multimodal and unsupervised learning.

AI efficiencyAI productionMultimodal AI
0 likes · 11 min read
Boosting AI Efficiency in Digital Content Risk Control: Insights from QCon