Tagged articles

large-scale AI

12 articles · Page 1 of 1

May 5, 2026 · Artificial Intelligence

Musk’s 550K Nvidia GPUs Achieve Only 11% Utilization – Like Running 60K GPUs

xAI’s massive fleet of roughly 550,000 Nvidia H100 and H200 GPUs in its Memphis and Colossus data centers is operating at a mere 11% model FLOPs utilization, highlighting how scaling to hundreds of thousands of GPUs creates coordination, network, and scheduling bottlenecks that waste most of the hardware’s compute power.

AI InfrastructureGPU UtilizationNvidia H100

0 likes · 5 min read

Musk’s 550K Nvidia GPUs Achieve Only 11% Utilization – Like Running 60K GPUs

Machine Heart

Apr 25, 2026 · Artificial Intelligence

Jeff Dean’s New Paper Shows Elastic Large‑Scale Distributed Pre‑Training Is Now Feasible

Decoupled DiLoCo, a new distributed training framework introduced by Jeff Dean and colleagues, enables resilient large‑scale AI pre‑training across heterogeneous hardware by decoupling learners, using lightweight syncers, adaptive quorum, and balanced tensor fragmentation, dramatically improving goodput and reducing bandwidth while preserving model quality.

Bandwidth ReductionDecoupled DiLoCoGoodput

0 likes · 10 min read

Jeff Dean’s New Paper Shows Elastic Large‑Scale Distributed Pre‑Training Is Now Feasible

JD Tech

Jan 31, 2026 · Artificial Intelligence

How JD's 9N‑LLM Engine Powers Scalable Generative Recommendation at Massive Scale

This article details JD Retail's 9N‑LLM unified training framework that tackles the massive data, hardware heterogeneity, and algorithmic challenges of generative recommendation by integrating TensorFlow and PyTorch, supporting GPU/NPU, and delivering high‑throughput sample processing, sparse/dense optimization, and flexible reinforcement‑learning capabilities.

GPU/NPURaylarge-scale AI

0 likes · 26 min read

How JD's 9N‑LLM Engine Powers Scalable Generative Recommendation at Massive Scale

Alimama Tech

Oct 1, 2025 · Artificial Intelligence

How RecIS Revolutionizes Large‑Scale Sparse‑Dense Recommendation Training

RecIS is an open‑source, PyTorch‑based unified framework designed for ultra‑large‑scale sparse‑dense computation in recommendation systems, offering a full solution for training models with massive samples, multimodal inputs, and large embeddings, and demonstrating significant performance gains over TensorFlow and TorchRec in production deployments.

PyTorchRecommendation Systemsdeep learning framework

0 likes · 24 min read

How RecIS Revolutionizes Large‑Scale Sparse‑Dense Recommendation Training

Architects' Tech Alliance

Sep 12, 2024 · Industry Insights

Managing and Optimizing Large‑Scale AI Compute Clusters: Practical Insights

This article examines the key pain points of massive AI compute clusters—including heterogeneous hardware compatibility, efficient scheduling, training and inference acceleration, and fault‑tolerant operations—while presenting practical management and performance‑tuning strategies, a cloud‑native AI platform implementation, and future directions for the ecosystem.

AI computingOperationsPerformance Tuning

0 likes · 7 min read

Managing and Optimizing Large‑Scale AI Compute Clusters: Practical Insights

NewBeeNLP

Jul 5, 2024 · Artificial Intelligence

Unveiling Meta’s Wukong: How Scaling Laws Boost Large‑Scale Recommendation Performance

Meta’s new paper introduces the Wukong model, demonstrating that expanding dense‑layer parameters and computational FLOPs in large‑scale recommendation systems follows a clear scaling law, yielding consistent performance gains across massive internal datasets, with detailed analysis of feature modules, parameter impacts, and experimental results.

CTR modelsMetaRecommendation Systems

0 likes · 10 min read

Unveiling Meta’s Wukong: How Scaling Laws Boost Large‑Scale Recommendation Performance

DataFunTalk

Jan 7, 2024 · Artificial Intelligence

Baidu's Recommendation Ranking: Background, Feature Design, Algorithms, Architecture, and Future Directions

This article presents Baidu's comprehensive approach to feed recommendation ranking, covering business and data background, feature engineering principles, core algorithmic strategies, system architecture design, and upcoming plans to integrate large language models for more intelligent and fair recommendations.

BaiduRecommendation Systemsfeature engineering

0 likes · 19 min read

Baidu's Recommendation Ranking: Background, Feature Design, Algorithms, Architecture, and Future Directions

Kuaishou Tech

Oct 26, 2023 · Artificial Intelligence

SHARK: Efficient Embedding Compression for Large-Scale Recommendation Models

The paper introduces SHARK, a two‑component framework that uses a fast Taylor‑expanded permutation method to prune embedding tables and a frequency‑aware quantization scheme to apply mixed‑precision to embeddings, achieving up to 70% memory reduction and 30% QPS improvement in industrial short‑video and e‑commerce recommendation systems.

EfficiencyModel PruningQuantization

0 likes · 8 min read

SHARK: Efficient Embedding Compression for Large-Scale Recommendation Models

AntTech

Feb 24, 2023 · Artificial Intelligence

Large-Scale Complex Heterogeneous Graph Data Intelligent Analysis Technology Wins 2022 CIEE Science and Technology Award

The 2022 China Institute of Electronics (CIEE) Science and Technology Award recognized a collaborative project between Beijing University of Posts and Telecommunications and Ant Group for pioneering large-scale heterogeneous graph neural network models, a trillion‑scale dynamic graph learning system, and extensive industry applications, earning top honors, patents, papers, and standards.

Graph Neural NetworksTechnology Awardheterogeneous graphs

0 likes · 4 min read

Large-Scale Complex Heterogeneous Graph Data Intelligent Analysis Technology Wins 2022 CIEE Science and Technology Award

Tencent Tech

Aug 26, 2020 · Artificial Intelligence

How Tencent Engineers Shattered the 128‑GPU ImageNet Training Record in 2m31s

Tencent engineers broke the world record for training ImageNet with 128 V100 GPUs in just 2 minutes 31 seconds, detailing a suite of optimizations—including a new Light distributed training framework, single‑machine speed boosts, multi‑machine communication enhancements, and advanced batch convergence techniques—that together dramatically cut training time while maintaining high accuracy.

GPUImageNetTencent Cloud

0 likes · 9 min read

How Tencent Engineers Shattered the 128‑GPU ImageNet Training Record in 2m31s

21CTO

Feb 18, 2020 · Artificial Intelligence

Inside Toutiao’s Real‑Time Recommendation Engine: Architecture, Features, and Evaluation

This article details Toutiao’s large‑scale recommendation system, explaining how it models content, user, and environment features, the variety of algorithms and real‑time training pipelines used, feature engineering categories, recall strategies, content analysis, user tagging, evaluation methods, and content‑safety mechanisms.

Content SafetyEvaluationReal-time Training

0 likes · 18 min read

Inside Toutiao’s Real‑Time Recommendation Engine: Architecture, Features, and Evaluation

Alibaba Cloud Developer

Sep 29, 2017 · Artificial Intelligence

Alibaba iDST’s Winning Strategy in ACM MM2017 Large-Scale Video Classification

The Alibaba iDST team clinched first place in the ACM MM2017 LSVC competition by leveraging Alibaba Cloud’s ODPS to extract eight multimodal features, achieving a 0.8485 mAP on the validation set, and demonstrating the critical role of rich modality fusion in large‑scale video classification.

AlibabaMultimodal LearningODPS

0 likes · 5 min read

Alibaba iDST’s Winning Strategy in ACM MM2017 Large-Scale Video Classification