Tagged articles
12 articles
Page 1 of 1
Machine Heart
Machine Heart
May 5, 2026 · Artificial Intelligence

Musk’s 550K Nvidia GPUs Achieve Only 11% Utilization – Like Running 60K GPUs

xAI’s massive fleet of roughly 550,000 Nvidia H100 and H200 GPUs in its Memphis and Colossus data centers is operating at a mere 11% model FLOPs utilization, highlighting how scaling to hundreds of thousands of GPUs creates coordination, network, and scheduling bottlenecks that waste most of the hardware’s compute power.

AI InfrastructureGPU utilizationNvidia H100
0 likes · 5 min read
Musk’s 550K Nvidia GPUs Achieve Only 11% Utilization – Like Running 60K GPUs
Machine Heart
Machine Heart
Apr 25, 2026 · Artificial Intelligence

Jeff Dean’s New Paper Shows Elastic Large‑Scale Distributed Pre‑Training Is Now Feasible

Decoupled DiLoCo, a new distributed training framework introduced by Jeff Dean and colleagues, enables resilient large‑scale AI pre‑training across heterogeneous hardware by decoupling learners, using lightweight syncers, adaptive quorum, and balanced tensor fragmentation, dramatically improving goodput and reducing bandwidth while preserving model quality.

Bandwidth ReductionDecoupled DiLoCoDistributed Training
0 likes · 10 min read
Jeff Dean’s New Paper Shows Elastic Large‑Scale Distributed Pre‑Training Is Now Feasible
JD Tech
JD Tech
Jan 31, 2026 · Artificial Intelligence

How JD's 9N‑LLM Engine Powers Scalable Generative Recommendation at Massive Scale

This article details JD Retail's 9N‑LLM unified training framework that tackles the massive data, hardware heterogeneity, and algorithmic challenges of generative recommendation by integrating TensorFlow and PyTorch, supporting GPU/NPU, and delivering high‑throughput sample processing, sparse/dense optimization, and flexible reinforcement‑learning capabilities.

GPU/NPURaylarge-scale AI
0 likes · 26 min read
How JD's 9N‑LLM Engine Powers Scalable Generative Recommendation at Massive Scale
Alimama Tech
Alimama Tech
Oct 1, 2025 · Artificial Intelligence

How RecIS Revolutionizes Large‑Scale Sparse‑Dense Recommendation Training

RecIS is an open‑source, PyTorch‑based unified framework designed for ultra‑large‑scale sparse‑dense computation in recommendation systems, offering a full solution for training models with massive samples, multimodal inputs, and large embeddings, and demonstrating significant performance gains over TensorFlow and TorchRec in production deployments.

PyTorchRecommendation Systemsdeep learning framework
0 likes · 24 min read
How RecIS Revolutionizes Large‑Scale Sparse‑Dense Recommendation Training
Architects' Tech Alliance
Architects' Tech Alliance
Sep 12, 2024 · Industry Insights

Managing and Optimizing Large‑Scale AI Compute Clusters: Practical Insights

This article examines the key pain points of massive AI compute clusters—including heterogeneous hardware compatibility, efficient scheduling, training and inference acceleration, and fault‑tolerant operations—while presenting practical management and performance‑tuning strategies, a cloud‑native AI platform implementation, and future directions for the ecosystem.

AI computingCluster ManagementOperations
0 likes · 7 min read
Managing and Optimizing Large‑Scale AI Compute Clusters: Practical Insights
NewBeeNLP
NewBeeNLP
Jul 5, 2024 · Artificial Intelligence

Unveiling Meta’s Wukong: How Scaling Laws Boost Large‑Scale Recommendation Performance

Meta’s new paper introduces the Wukong model, demonstrating that expanding dense‑layer parameters and computational FLOPs in large‑scale recommendation systems follows a clear scaling law, yielding consistent performance gains across massive internal datasets, with detailed analysis of feature modules, parameter impacts, and experimental results.

CTR modelsDeep LearningMeta
0 likes · 10 min read
Unveiling Meta’s Wukong: How Scaling Laws Boost Large‑Scale Recommendation Performance
DataFunTalk
DataFunTalk
Jan 7, 2024 · Artificial Intelligence

Baidu's Recommendation Ranking: Background, Feature Design, Algorithms, Architecture, and Future Directions

This article presents Baidu's comprehensive approach to feed recommendation ranking, covering business and data background, feature engineering principles, core algorithmic strategies, system architecture design, and upcoming plans to integrate large language models for more intelligent and fair recommendations.

BaiduRecommendation Systemsfeature engineering
0 likes · 19 min read
Baidu's Recommendation Ranking: Background, Feature Design, Algorithms, Architecture, and Future Directions
Kuaishou Tech
Kuaishou Tech
Oct 26, 2023 · Artificial Intelligence

SHARK: Efficient Embedding Compression for Large-Scale Recommendation Models

The paper introduces SHARK, a two‑component framework that uses a fast Taylor‑expanded permutation method to prune embedding tables and a frequency‑aware quantization scheme to apply mixed‑precision to embeddings, achieving up to 70% memory reduction and 30% QPS improvement in industrial short‑video and e‑commerce recommendation systems.

Model Pruningefficiencyembedding compression
0 likes · 8 min read
SHARK: Efficient Embedding Compression for Large-Scale Recommendation Models
AntTech
AntTech
Feb 24, 2023 · Artificial Intelligence

Large-Scale Complex Heterogeneous Graph Data Intelligent Analysis Technology Wins 2022 CIEE Science and Technology Award

The 2022 China Institute of Electronics (CIEE) Science and Technology Award recognized a collaborative project between Beijing University of Posts and Telecommunications and Ant Group for pioneering large-scale heterogeneous graph neural network models, a trillion‑scale dynamic graph learning system, and extensive industry applications, earning top honors, patents, papers, and standards.

Technology Awardgraph neural networksheterogeneous graphs
0 likes · 4 min read
Large-Scale Complex Heterogeneous Graph Data Intelligent Analysis Technology Wins 2022 CIEE Science and Technology Award
Tencent Tech
Tencent Tech
Aug 26, 2020 · Artificial Intelligence

How Tencent Engineers Shattered the 128‑GPU ImageNet Training Record in 2m31s

Tencent engineers broke the world record for training ImageNet with 128 V100 GPUs in just 2 minutes 31 seconds, detailing a suite of optimizations—including a new Light distributed training framework, single‑machine speed boosts, multi‑machine communication enhancements, and advanced batch convergence techniques—that together dramatically cut training time while maintaining high accuracy.

GPUImageNetTencent Cloud
0 likes · 9 min read
How Tencent Engineers Shattered the 128‑GPU ImageNet Training Record in 2m31s
21CTO
21CTO
Feb 18, 2020 · Artificial Intelligence

Inside Toutiao’s Real‑Time Recommendation Engine: Architecture, Features, and Evaluation

This article details Toutiao’s large‑scale recommendation system, explaining how it models content, user, and environment features, the variety of algorithms and real‑time training pipelines used, feature engineering categories, recall strategies, content analysis, user tagging, evaluation methods, and content‑safety mechanisms.

Content SafetyReal-time Trainingevaluation
0 likes · 18 min read
Inside Toutiao’s Real‑Time Recommendation Engine: Architecture, Features, and Evaluation